APM1 biallelic markers and uses thereof

ABSTRACT

The invention provides novel APM1 genomic sequences, polypeptides, antibodies, and polynucleotides including biallelic markers derived from the APM1 locus. Primers hybridizing to regions flanking these biallelic markers are also provided. This invention also provides polynucleotides and methods suitable for genotyping a nucleic acid containing sample for one or more biallelic markers of the invention. Additionally, the invention provides methods to detect a statistical correlation between a biallelic marker allele and a phenotype and/or between a biallelic marker haplotype and a phenotype. Further, the invention provides diagnostic methods for early detection of obesity-related disorders.

This application is a continuation of U.S. patent application Ser. No.10/376,460, filed Feb. 28, 2003, which is a divisional of U.S. patentapplication Ser. No. 09/569,852, filed May 10, 2000, now U.S. Pat. No.6,582,909, which is a continuation-in-part of International PatentApplication No. PCT/IB99/01858, filed Nov. 4, 1999 and U.S.Non-Provisional patent application Ser. No. 09/434,848, filed Nov. 4,1999, now abandoned, both of which claim priority to U.S. ProvisionalPatent Application Ser. No. 60/119,593, filed Feb. 10, 1999, and U.S.Provisional Patent Application Ser. No. 60/107,113, filed Nov. 4, 1998.All of the above-referenced applications are hereby incorporated byreference herein in their entireties, including any figures, tables,nucleic acid sequences, amino acid sequences, or drawings.

FIELD OF THE INVENTION

The invention concerns the genomic and cDNA sequences of the APM1 gene,as well as methods and kits for detecting these polynucleotides. Theinvention also concerns the regulatory regions, particularly thepromoter region of the APM1 gene. The invention comprises biallelicmarkers of the APM1 gene which can be useful for diagnosis of obesity ordisorders related to obesity.

BACKGROUND

Obesity is a public health problem that is both serious and widespread.One-third of the population in industrialized countries has an excessweight of at least 20% relative to the ideal weight. The phenomenoncontinues to worsen, particularly in regions of the globe whereeconomies are modernizing. In the United States, the number of obesepeople has escalated from 25% at the end of the 70s to 33% at thebeginning of the 90s.

Obesity considerably increases the risk of developing cardiovascular andmetabolic diseases. It is estimated that if the entire population had anideal weight, the risk of coronary insufficiency would decrease by 25%and that of cardiac insufficiency and of cerebral vascular accidents by35%. Coronary insufficiency, atheromatous disease and cardiacinsufficiency are at the forefront of the cardiovascular complicationsinduced by obesity. For an excess weight greater than 30%, the incidenceof coronary diseases is doubled in subjects less than 50 years old.Studies carried out for other diseases are equally significant. For anexcess weight of 20%, the risk of high blood pressure is doubled. For anexcess weight of 30%, the risk of developing non-insulin-dependentdiabetes is tripled and the risk of hyperlipidemias is multiplied sixfold.

The list of diseases having onsets promoted by obesity is long:hyperuricemia (11.4% in obese subjects, compared with 3.4% in thegeneral population), digestive pathologies, abnormalities in hepaticfunctions, and even certain cancers.

Whether the physiological changes in obesity are characterized by anincrease in the number of adipose cells, or by an increase in thequantity of triglycerides stored in each adipose cell, or by both, thisexcess weight results mainly from an imbalance between the quantities ofcalories consumed and the quantity of calories used by the body. Somestudies on the causes of this imbalance have focused on studying themechanism of absorption of foods, and therefore the molecules whichcontrol food intake and the feeling of satiety. Other studies havecharacterized the pathways through which the body uses its calories.

The treatments for obesity which have been proposed are of fourtypes. 1) Food restriction is the most frequently used. Obeseindividuals are advised to change their dietary habits so as to consumefewer calories. This type of treatment is effective in the short-term.However, the recidivation rate is very high. 2) Increased calorie usethrough physical exercise is also proposed. This treatment isineffective when applied alone, but it improves, however, weight-loss insubjects on a low-calorie diet. 3) Gastrointestinal surgery, whichreduces the absorption of the calories ingested, is effective but hasbeen virtually abandoned because of the side effects which it causes. 4)The medicinal approach uses either the anorexigenic action of moleculesinvolved at the level of the central nervous system, or the effect ofmolecules which increase energy use by increasing the production ofheat. The prototypes of this kind of molecule are the thyroid hormonesthat uncouple oxidative phosphorylations of the mitochondrialrespiratory chain. The side effects and the toxicity of this type oftreatment make their use dangerous. An approach that aims to reduce theabsorption of dietary lipids by sequestering them in the lumen of thedigestive tube is also in place. However, it induces physiologicalimbalances that are difficult to tolerate: deficiency in the absorptionof fat-soluble vitamins, flatulence and steatorrhoea. Whatever theenvisaged therapeutic approach, the treatments of obesity are allcharacterized by an extremely high recidivation rate.

The molecular mechanisms responsible for obesity in man are complex andinvolve genetic and environmental factors. Because of the low efficiencyof the treatments known up until now, it is urgent to define the geneticmechanisms that determine obesity, so as to be able to develop bettertargeted medicaments.

More than 20 genes have been studied as possible candidates, eitherbecause they have been implicated in diseases of which obesity is one ofthe clinical manifestations, or because they are homologues of genesinvolved in obesity in animal models. Situated in the 7q31 chromosomalregion, the OB gene is one of the most widely studied. Its product,leptin, is involved in the mechanisms of satiety. Leptin is a plasmaprotein of 16 kDa produced by the adipocytes under the action of variousstimuli. Obese mice of the ob/ob type exhibit a deficiency in the leptingene; this protein is undetectable in the plasma of these animals. Theadministration of leptin obtained by genetic engineering to ob/ob micecorrects their relative hyperphagia and allows normalization of theirweight. This anorexigenic effect of leptin calls into play a receptor ofthe central nervous system: the ob receptor which belongs to the familyof class 1 cytokine receptors. The ob receptor is deficient in obesemice of the db/db strain. The administration of leptin to these mice hasno effect on their food intake and does not allow substantial reductionin their weight. The mechanisms by which the ob receptors transmit thesignal for satiety are not precisely known. It is possible thatneuropeptide Y is involved in this signaling pathway. It is important tospecify at this stage that the ob receptors are not the only regulatorsof appetite. The Melanocortin 4 receptor is also involved since micemade deficient in this receptor are obese (Gura, (1997)).

The discovery of leptin and the characterization of the leptin receptorat the level of the central nervous system opened a new route for thesearch for medicaments against obesity. This model, however, rapidlyproved disappointing. Indeed, with only one exception (Montague et al.,(1997)), the genes encoding leptin or its ob receptor have proved to benormal in obese human subjects. Furthermore and paradoxically, theplasma concentrations of leptin, the satiety hormone, are abnormallyhigh in most obese human subjects.

Clearly there remains a need for novel medicaments that are useful forreducing body weight in humans. Such pharmaceutical compositionsadvantageously would help to control obesity and thereby alleviate manyof the cardiovascular consequences associated with this condition.

The human adipocyte-specific APM1 gene encodes a secretory protein ofthe adipose tissue and is likely to play a role in the pathogenesis ofobesity. Knowledge of the APM1 genomic sequence, and particularly ofboth promoter and splice junction sequences, allows the design of noveldiagnostics and therapeutic tools that act on lipid metabolism, and areuseful for diagnosing and treating obesity disorders.

SUMMARY OF THE INVENTION

The present invention stems from the isolation and characterization ofthe genomic sequence of APM1 gene including its regulatory regions andof the complete cDNA sequence encoding the APM 1 protein.Oligonucleotide probes and primers hybridizing specifically with agenomic sequence of APM1 are also part of the invention. A furtherobject of the invention consists of recombinant vectors comprising anyof the nucleic acid sequences described in the present invention, and inparticular of recombinant vectors comprising the promoter region of APM1or a sequence encoding the APM1 protein, as well as cell hostscomprising said nucleic acid sequences or recombinant vectors. Theinvention also encompasses methods of screening of molecules whichmodulate or inhibit the expression of the APM1 gene. The invention isalso directed to biallelic markers that are located within the APM1genomic sequence, these biallelic markers representing useful tools inorder to identify a statistically significant association betweenspecific alleles of APM1 gene and one or several disorders related toobesity. Further, the invention relates to the use of these biallelicmarker associations to indicate people at risk for diseases, includingobesity-related diseases, as well as to identify people who would becandidates or non-candidates for a drug treatment, or a clinical trial.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 shows a map of the genomic organization of human Apm1 (AdiposeMost Abundant Gene Transcript 1) and the location of the biallelicmarkers identified in the application.

FIGS. 2A, 2B, and 2C are a graphical representation of the effect ofApm1 polymorphisms on plasma lipid values in obese adolescent girls. Themean and 99.99% confidence interval are indicated as a solid and dottedline, respectively.

FIGS. 3A and 3B are a graphical representation of the effect of APM1polymorphism on leptin/BMI relationship in obese adolescent girls.

FIGS. 4A and 4B are a graphical representation of the effect of APM1polymorphism on FFA in obese adolescents girls.

FIGS. 5A and 5B are a graphical representation of the effect of APM1polymorphism on respiratory quotient in obese adolescents.

FIG. 6 is a graphical representation of the effect of APM1 on leptin/BMIratio in obese adolescents girls.

FIG. 7 is a graphical representation of the effect of APM1 polymorphismon glucose tolerance in obese adolescent girls.

FIG. 8 shows Apm1 function predicted from polymorphism and in vivoanalysis.

BRIEF DESCRIPTION OF THE SEQUENCES PROVIDED IN THE SEQUENCE LISTING

SEQ ID NO: 1 contains a genomic sequence of APM1 comprising the 5′regulatory region (upstream untranscribed region), the exons andintrons, and the 3′ regulatory region (downstream untranscribed region).

SEQ ID NO: 2 contains a 5′ regulatory region (upstream untranscribedregion) of the APM1 gene.

SEQ ID NO: 3 contains a 3′ regulatory region (downstream untranscribedregion) of the APM1 gene.

SEQ ID NO: 4 contains a partial 5′ cDNA of APM1.

SEQ ID NO: 5 contains a complete human APM1 cDNA.

SEQ ID NO: 6 contains the APM1 protein encoded by the cDNA of SEQ ID NO5.

SEQ ID NO: 7 contains a primer containing the additional PU 5′ sequencedescribed further in Example 2

SEQ ID NO: 8 contains a primer containing the additional RP 5′ sequencedescribed further in Example 2.

In accordance with the regulations relating to Sequence Listings, thefollowing codes have been used in the Sequence Listing to indicate thelocations of biallelic markers within the sequences and to identify eachof the alleles present at the polymorphic base. The code “r” in thesequences indicates that one allele of the polymorphic base is aguanine, while the other allele is an adenine. The code “y” in thesequences indicates that one allele of the polymorphic base is athymine, while the other allele is a cytosine. The code “m” in thesequences indicates that one allele of the polymorphic base is anadenine, while the other allele is an cytosine. The code “k” in thesequences indicates that one allele of the polymorphic base is aguanine, while the other allele is a thymine. The code “s” in thesequences indicates that one allele of the polymorphic base is aguanine, while the other allele is a cytosine. The code “w” in thesequences indicates that one allele of the polymorphic base is anadenine, while the other allele is an thymine. The nucleotide code ofthe original allele for each biallelic marker is the following:Biallelic marker Original allele  9-27-261 G 99-14387-129 A  9-12-48 T 9-12-124 T  9-12-355 G  9-12-428 A 99-14405-105 G 17-30-216 G  9-27-211A  9-27-246 G 17-31-298 A 17-31-413 T 17-32-24 T 99-14387-50 C99-14387-199 A 17-33-TGAGACT none 17-34-860 G 17-34-915 G 17-35-71 C17-35-306 G 17-36-47 G 17-36-120 C 17-37-629 A 17-37-811 G 17-38-349 C

DETAILED DESCRIPTION

The aim of the present invention is to provide polynucleotides derivedfrom the APM1 gene, which are particularly useful to design suitablemeans for detecting the presence of this gene in a test sample oralternatively the APM1 mRNA molecules that are present in a test sample.The present invention also deals with polynucleotides involved in theexpression of the APM1 gene and which can be used for designing meanscapable of modulating the expression of APM1. Other polynucleotides ofthe invention are useful to design suitable means to express a desiredpolynucleotide of interest. The present invention also encompassesbiallelic markers of the APM1 gene, and their use, based on biallelicmarker association studies, to indicate people at risk for diseases,including obesity-related diseases, as well as to identify people whowould be candidates or non-candidates for a drug treatment, or aclinical trial.

Definitions

Before describing the invention in greater detail, the followingdefinitions are set forth to illustrate and define the meaning and scopeof the terms used to describe the invention herein.

The term “APM 1 gene”, when used herein, encompasses genomic, mRNA andcDNA sequences encoding the APM1 protein, including the untranslatedregulatory regions of the genomic DNA.

The term “heterologous protein”, when used herein, is intended todesignate any protein or polypeptide other than the APM1 protein. Moreparticularly, the heterologous protein is a compound which can be usedas a marker in further experiments with a APM1 regulatory region.

The term “isolated” requires that the material be removed from itsoriginal environment (e.g., the natural environment if it is naturallyoccurring). For example, a naturally-occurring polynucleotide orpolypeptide present in a living animal is not isolated, but the samepolynucleotide or DNA or polypeptide, separated from some or all of thecoexisting materials in the natural system, is isolated. Suchpolynucleotide could be part of a vector and/or such polynucleotide orpolypeptide could be part of a composition, and still be isolated inthat the vector or composition is not part of its natural environment.

Specifically excluded from the definition of “isolated” are:naturally-occurring chromosomes (such as chromosome spreads), artificialchromosome libraries, genomic libraries, and cDNA libraries that existeither as an in vitro nucleic acid preparation or as atransfected/transformed host cell preparation, wherein the host cellsare either an in vitro heterogeneous preparation or plated as aheterogeneous population of single colonies. Also specifically excludedare the above libraries wherein a specified 5′ EST makes up less than 5%of the number of nucleic acid inserts in the vector molecules. Furtherspecifically excluded are whole cell genomic DNA or whole cell RNApreparations (including said whole cell preparations which aremechanically sheared or enzymaticly digested). Further specificallyexcluded are the above whole cell preparations as either an in vitropreparation or as a heterogeneous mixture separated by electrophoresis(including blot transfers of the same) wherein the polynucleotide of theinvention has not further been separated from the heterologouspolynucleotides in the electrophoresis medium (e.g., further separatingby excising a single band from a heterogeneous band population in anagarose gel or nylon blot).

The term “purified” does not require absolute purity; rather, it isintended as a relative definition. Purification of starting material ornatural material to at least one order of magnitude, preferably two orthree orders, and more preferably four or five orders of magnitude isexpressly contemplated. As an example, purification from 0.1%concentration to 10% concentration is two orders of magnitude.

The term “purified polynucleotide” or “purified polynucleotide vector”is used herein to describe a polynucleotide or polynucleotide vector ofthe invention which has been separated from other compounds including,but not limited to other nucleic acids, carbohydrates, lipids andproteins (such as the enzymes used in the synthesis of thepolynucleotide), or the separation of covalently closed polynucleotidesfrom linear polynucleotides. A polynucleotide is substantially pure whenat least about 50%, preferably 60 to 75% of a sample exhibits a singlepolynucleotide sequence and conformation (linear versus covalentlyclosed). A substantially pure polynucleotide typically comprises about50%, preferably 60 to 90% weight/weight of a nucleic acid sample, moreusually about 95%, and preferably is over about 99% pure. Polynucleotidepurity or homogeneity is indicated by a number of means well known inthe art, such as agarose or polyacrylamide gel electrophoresis of asample, followed by visualizing a single polynucleotide band uponstaining the gel. For certain purposes higher resolution can be providedby using HPLC or other means well known in the art.

The term “polypeptide” refers to a polymer of amino without regard tothe length of the polymer; thus, peptides, oligopeptides, and proteinsare included within the definition of polypeptide. This term also doesnot specify or exclude post-translation modifications of polypeptides.For example, polypeptides that include the covalent attachment ofglycosyl groups, acetyl groups, phosphate groups, lipid groups and thelike are expressly encompassed by the term polypeptide. Also includedwithin the definition are polypeptides which contain one or more analogsof an amino acid (including, for example, non-naturally occurring aminoacids, amino acids which only occur naturally in an unrelated biologicalsystem, modified amino acids from mammalian systems, etc.), polypeptideswith substituted linkages, as well as other modifications known in theart, both naturally occurring and non-naturally occurring.

The term “recombinant polypeptide” is used herein to refer topolypeptides that have been artificially designed and which comprise atleast two polypeptide sequences that are not found as contiguouspolypeptide sequences in their initial natural environment, or to referto polypeptides which have been expressed from a recombinantpolynucleotide.

The term “purified polypeptide” is used herein to describe a polypeptideof the invention that has been separated from other compounds including,but not limited to nucleic acids, lipids, carbohydrates and otherproteins. A polypeptide is substantially pure when a sample contains atleast about 50%, preferably 60 to 75%, of a single polypeptide sequence.A substantially pure polypeptide typically comprises about 50%,preferably 60 to 90%, more preferably 95 to 99% weight/weight of aprotein sample. Polypeptide purity or homogeneity is indicated by anumber of means well known in the art, such as agarose or polyacrylamidegel electrophoresis of a sample, followed by visualizing polypeptidebands upon staining the gel. For certain purposes higher resolution canbe provided by using HPLC or other means well known in the art.

As used herein, the term “non-human animal” refers to any non-humanvertebrate, birds and more usually mammals, preferably primates, farmanimals such as swine, goats, sheep, donkeys, and horses, rabbits orrodents, more preferably rats or mice. As used herein, the term “animal”is used to refer to any vertebrate, preferable a mammal. Both the terms“animal” and “mammal” expressly embrace human subjects unless precededwith the term “non-human”.

As used herein, the term “antibody” refers to a polypeptide or group ofpolypeptides which are comprised of at least one binding domain, wherean antibody binding domain is formed from the folding of variabledomains of an antibody molecule to form three-dimensional binding spaceswith an internal surface shape and charge distribution complementary tothe features of an antigenic determinant of an antigen., which allows animmunological reaction with the antigen. Antibodies include recombinantproteins comprising the binding domains, as wells as fragments,including Fab, Fab′, F(ab)₂, and F(ab′)₂ fragments.

As used herein, an “antigenic determinant” is the portion of an antigenmolecule, in this case an APM1 polypeptide, that determines thespecificity of the antigen-antibody reaction. An “epitope” refers to anantigenic determinant of a polypeptide. An epitope can comprise as fewas 3 amino acids in a spatial conformation which is unique to theepitope. Generally an epitope consists of at least 6 such amino acids,and more usually at least 8-10 such amino acids. Methods for determiningthe amino acids which make up an epitope include x-ray crystallography,2-dimensional nuclear magnetic resonance, and epitope mapping e.g. hePepscan method described by H. Mario Geysen et al. 1984. Proc. Natl.Acad. Sci. U.S.A. 81:3998-4002; PCT Publication No. WO 84/03564; and PCTPublication No. WO 84/03506.

Throughout the present specification, the expression “nucleotidesequence” may be employed to designate indifferently a polynucleotide ora nucleic acid. More precisely, the expression “nucleotide sequence”encompasses the nucleic material itself and is thus not restricted tothe sequence information (i.e. the succession of letters chosen amongthe four base letters) that biochemically characterizes a specific DNAor RNA molecule.

As used interchangeably herein, the terms “nucleic acids”,“oligonucleotides”, and “polynucleotides” include RNA, DNA, or RNA/DNAhybrid sequences of more than one nucleotide in either single chain orduplex form. The term “nucleotide” as used herein as an adjective todescribe molecules comprising RNA, DNA, or RNA/DNA hybrid sequences ofany length in single-stranded or duplex form. The term “nucleotide” isalso used herein as a noun to refer to individual nucleotides orvarieties of nucleotides, meaning a molecule, or individual unit in alarger nucleic acid molecule, comprising a purine or pyrimidine, aribose or deoxyribose sugar moiety, and a phosphate group, orphosphodiester linkage in the case of nucleotides within anoligonucleotide or polynucleotide. The term “nucleotide” is also usedherein to encompass “modified nucleotides” which comprise at least oneof the following modifications: (a) an alternative linking group, (b) ananalogous form of purine, (c) an analogous form of pyrimidine, or (d) ananalogous sugar. For examples of analogous linking groups, purine,pyrimidines, and sugars, see for example PCT publication No. WO95/04064. The polynucleotide sequences of the invention may be preparedby any known method, including synthetic, recombinant, ex vivogeneration, or a combination thereof, as well as utilizing anypurification methods known in the art.

A “promoter” refers to a DNA sequence recognized by the syntheticmachinery of the cell required to initiate the specific transcription ofa gene.

A sequence which is “operably linked” to a regulatory sequence such as apromoter means that said regulatory element is in the correct locationand orientation in relation to the nucleic acid to control RNApolymerase initiation and expression of the nucleic acid of interest. Asused herein, the term “operably linked” refers to a linkage ofpolynucleotide elements in a functional relationship. For instance, apromoter or enhancer is operably linked to a coding sequence if itaffects the transcription of the coding sequence. More precisely, twoDNA molecules (such as a polynucleotide containing a promoter region anda polynucleotide encoding a desired polypeptide or polynucleotide) aresaid to be “operably linked” if the nature of the linkage between thetwo polynucleotides does not (1) result in the introduction of aframe-shift mutation or (2) interfere with the ability of thepolynucleotide containing the promoter to direct the transcription ofthe coding polynucleotide.

The term “primer” denotes a specific oligonucleotide sequence which iscomplementary to a target nucleotide sequence and used to hybridize tothe target nucleotide sequence. A primer serves as an initiation pointfor nucleotide polymerization catalyzed by either DNA polymerase, RNApolymerase or reverse transcriptase.

The term “probe” denotes a defined nucleic acid segment (or nucleotideanalog segment, e.g., polynucleotide as defined hereinbelow) which canbe used to identify a specific polynucleotide sequence present insamples, said nucleic acid segment comprising a nucleotide sequencecomplementary to the specific polynucleotide sequence to be identified.

The terms “trait” and “phenotype” are used interchangeably herein andrefer to any visible, detectable or otherwise measurable property of anorganism such as symptoms of, or susceptibility to a disease forexample. Typically the terms “trait” or “phenotype” are used herein torefer to symptoms of, or susceptibility to, either obesity or disordersrelated to obesity, more particularly atherosclerosis, insulinresistance, hypertension, hyperlipidemia, hypertriglyceridemia,cardiovascular disease, microangiopathic in obese individuals with TypeII diabetes, ocular lesions associated with microangiopathy in obeseindividuals with Type II diabetes, renal lesions associated withmicroangiopathy in obese individuals with Type II diabetes, and SyndromeX

The term “allele” is used herein to refer to variants of a nucleotidessequence. A biallelic polymorphism has two forms. Typically the firstidentified allele is designated as the original allele whereas otheralleles are designated as alternative alleles. Diploid organisms may behomozygous or heterozygous for an allelic form.

The term “heterozygosity rate” is used herein to refer to the incidenceof individuals in a population which are heterozygous at a particularallele. In a biallelic system, the heterozygosity rate is on averageequal to 2P_(a)(1−P_(a)), where P_(a) is the frequency of the leastcommon allele. In order to be useful in genetic studies, a geneticmarker should have an adequate level of heterozygosity to allow areasonable probability that a randomly selected person will beheterozygous.

The term “genotype” as used herein refers to the identity of the allelespresent in an individual or a sample. In the context of the presentinvention, a genotype preferably refers to the description of thebiallelic marker alleles present in an individual or a sample. The term“genotyping” a sample or an individual for a biallelic marker consistsof determining the specific allele or the specific nucleotide carried byan individual at a biallelic marker.

The term “mutation” as used herein refers to a difference in DNAsequence between or among different genomes or individuals which has afrequency below 1%.

The term “haplotype” refers to a combination of alleles present in anindividual or a sample. In the context of the present invention, ahaplotype preferably refers to a combination of biallelic marker allelesfound in a given individual and which may be associated with aphenotype.

The term “polymorphism” as used herein refers to the occurrence of twoor more alternative genomic sequences or alleles between or amongdifferent genomes or individuals. “Polymorphic” refers to the conditionin which two or more variants of a specific genomic sequence can befound in a population. A “polymorphic site” is the locus at which thevariation occurs. A single nucleotide polymorphism is the replacement ofone nucleotide by another nucleotide at the polymorphic site. Deletionof a single nucleotide or insertion of a single nucleotide also givesrise to single nucleotide polymorphisms. In the context of the presentinvention, “single nucleotide polymorphism” preferably refers to asingle nucleotide substitution. Typically, between differentindividuals, the polymorphic site may be occupied by two differentnucleotides.

The term “biallelic polymorphism” and “biallelic marker” are usedinterchangeably herein to refer to a single nucleotide polymorphismhaving two alleles at a fairly high frequency in the population. A“biallelic marker allele” refers to the nucleotide variants present at abiallelic marker site. Typically, the frequency of the less commonallele of the biallelic markers of the present invention has beenvalidated to be greater than 1%, preferably the frequency is greaterthan 10%, more preferably the frequency is at least 20% (i.e.heterozygosity rate of at least 0.32), even more preferably thefrequency is at least 30% (i.e. heterozygosity rate of at least 0.42). Abiallelic marker wherein the frequency of the less common allele is 30%or more is termed a “high quality biallelic marker”.

The location of nucleotides in a polynucleotide with respect to thecenter of the polynucleotide are described herein in the followingmanner. When a polynucleotide has an odd number of nucleotides, thenucleotide at an equal distance from the 3′ and 5′ ends of thepolynucleotide is considered to be “at the center” of thepolynucleotide, and any nucleotide immediately adjacent to thenucleotide at the center, or the nucleotide at the center itself isconsidered to be “within 1 nucleotide of the center.” With an odd numberof nucleotides in a polynucleotide any of the five nucleotides positionsin the middle of the polynucleotide would be considered to be within 2nucleotides of the center, and so on. When a polynucleotide has an evennumber of nucleotides, there would be a bond and not a nucleotide at thecenter of the polynucleotide. Thus, either of the two centralnucleotides would be considered to be “within 1 nucleotide of thecenter” and any of the four nucleotides in the middle of thepolynucleotide would be considered to be “within 2 nucleotides of thecenter”, and so on. For polymorphisms which involve the substitution,insertion or deletion of 1 or more nucleotides, the polymorphism, alleleor biallelic marker is “at the center” of a polynucleotide if thedifference between the distance from the substituted, inserted, ordeleted polynucleotides of the polymorphism and the 3′ end of thepolynucleotide, and the distance from the substituted, inserted, ordeleted polynucleotides of the polymorphism and the 5′ end of thepolynucleotide is zero or one nucleotide. If this difference is 0 to 3,then the polymorphism is considered to be “within 1 nucleotide of thecenter.” If the difference is 0 to 5, the polymorphism is considered tobe “within 2 nucleotides of the center.” If the difference is 0 to 7,the polymorphism is considered to be “within 3 nucleotides of thecenter,” and so on.

The term “upstream” is used herein to refer to a location which istoward the 5′ end of the polynucleotide from a specific reference point.

The terms “base paired” and “Watson & Crick base paired” are usedinterchangeably herein to refer to nucleotides which can be hydrogenbonded to one another by virtue of their sequence identities in a mannerlike that found in double-helical DNA with thymine or uracil residueslinked to adenine residues by two hydrogen bonds and cytosine andguanine residues linked by three hydrogen bonds (See Stryer, L.,Biochemistry, 4^(th) edition, 1995).

The terms “complementary” or “complement thereof” are used herein torefer to the sequences of polynucleotides that are capable of formingWatson & Crick base pairing with another specified polynucleotidethroughout the entirety of the complementary region. For the purpose ofthe present invention, a first polynucleotide is deemed to becomplementary to a second polynucleotide when each base in the firstpolynucleotide is paired with its complementary base. Complementarybases are, generally, A and T (or A and U), or C and G. “Complement” isused herein as a synonym from “complementary polynucleotide”,“complementary nucleic acid” and “complementary nucleotide sequence”.These terms are applied to pairs of polynucleotides based solely upontheir sequences and not any particular set of conditions under which thetwo polynucleotides would actually bind.

The term “non-genic” is used herein to describe APM1-related biallelicmarkers, as well as polynucleotides and primers which occur outside thenucleotide positions shown in the human APM1 genomic sequence of SEQ IDNO: 1. The term “genic” is used herein to describe APM1-relatedbiallelic markers as well as polynucleotides and primers which do occurin the nucleotide positions shown in the human APM1 genomic sequence ofSEQ ID NO: 1.

Variants and Fragments

The invention also relates to variants and fragments of thepolynucleotides described herein, particularly of a APM1 gene containingone or more biallelic markers according to the invention.

Variants of polynucleotides, as the term is used herein, arepolynucleotides that differ from a reference polynucleotide. A variantof a polynucleotide may be a naturally occurring variant such as anaturally occurring allelic variant, or it may be a variant that is notknown to occur naturally. Such non-naturally occurring variants of thepolynucleotide may be made by mutagenesis techniques, including thoseapplied to polynucleotides, cells or organisms. Generally, differencesare limited so that the nucleotide sequences of the reference and thevariant are closely similar overall and, in many regions, identical.

Variants of polynucleotides according to the invention include, withoutbeing limited to, nucleotide sequences which are at least 95% identicalto a polynucleotide selected from the group consisting of SEQ ID Nos1-4, or to any polynucleotide fragment of at least 8 consecutivenucleotides of a polynucleotide selected from the group consisting ofSEQ ID Nos 1-3, and preferably at least 99% identical, more preferablyat least 99.5% identical, and most preferably at least 99.8% identicalto a polynucleotide selected from the group consisting of SEQ ID Nos 1-5or to any polynucleotide fragment of at least 8 consecutive nucleotidesof a polynucleotide selected from the group consisting of SEQ ID Nos1-3.

Nucleotide changes present in a variant polynucleotide may be silent,which means that they do not alter the amino acids encoded by thepolynucleotide. However, nucleotide changes may also result in aminoacid substitutions, additions, deletions, fusions and truncations in thepolypeptide encoded by the reference sequence. The substitutions,deletions or additions may involve one or more nucleotides. The variantsmay be altered in coding or non-coding regions or both. Alterations inthe coding regions may produce conservative or non-conservative aminoacid substitutions, deletions or additions.

In the context of the present invention, particularly preferredembodiments are those in which the polynucleotides encode polypeptidesthat increase or retain substantially the same biological function oractivity as the mature APM1 protein, or those in which thepolynucleotides encode polypeptides that maintain or increase aparticular biological activity, while reducing or maintaining a secondbiological activity

A polynucleotide fragment is a polynucleotide having a sequence that isentirely the same as part but not all of a given nucleotide sequence,preferably the nucleotide sequence of a APM1 gene, and variants thereof.The fragment can be a portion of an intron of a APM1 gene. It can alsobe a portion of the regulatory regions of APM1, preferably of thepromoter sequence of the APM1 gene. Preferably, such fragments compriseat least one of the biallelic markers A1 to A26 or the complementsthereto or a biallelic marker in linkage disequilibrium with one or moreof the biallelic markers A1 to A26.

Such fragments may be “free-standing”, i.e. not part of or fused toother polynucleotides, or they may be comprised within a single largerpolynucleotide of which they form a part or region. Indeed, several ofthese fragments may be present within a single larger polynucleotide.

Optionally, such fragments may consist of, or consist essentially of acontiguous span of at least 8, 10, 12, 15, 18, 20, 25, 35, 40, 50, 70,80, 100, 250, 500 or 1000 nucleotides in length. A set of preferredfragments contain at least one of the biallelic markers A1 to A26 of theAPM1 gene which are described herein or the complements thereto.

Identity Between Nucleic Acids Or Polypeptides

The terms “percentage of sequence identity” and “percentage homology”are used interchangeably herein to refer to comparisons amongpolynucleotides and polypeptides, and are determined by comparing twooptimally aligned sequences over a comparison window, wherein theportion of the polynucleotide or polypeptide sequence in the comparisonwindow may comprise additions or deletions (i.e., gaps) as compared tothe reference sequence (which does not comprise additions or deletions,although gaps may result where the other sequence contains additions)for optimal alignment of the two sequences. The percentage is calculatedby determining the number of positions at which the identical nucleicacid base or amino acid residue occurs in both sequences to yield thenumber of matched positions, dividing the number of matched positions bythe total number of positions in the window of comparison andmultiplying the result by 100 to yield the percentage of sequenceidentity. Homology is evaluated using any of the variety of sequencecomparison algorithms and programs known in the art. Such algorithms andprograms include, but are by no means limited to, TBLASTN, BLASTP,FASTA, TFASTA, and CLUSTALW (Pearson and Lipman, 1988, Proc. Natl. Acad.Sci. USA 85(8):2444-2448; Altschul et al., 1990, J. Mol. Biol.215(3):403-410; Thompson et al., 1994, Nucleic Acids Res.22(2):4673-4680; Higgins et al., 1996, Methods Enzymol. 266:383-402;Altschul et al., 1990, J. Mol. Biol. 215(3):403-410; Altschul et al.,1993, Nature Genetics 3:266-272). In a particularly preferredembodiment, protein and nucleic acid sequence homologies are evaluatedusing the Basic Local Alignment Search Tool (“BLAST”) which is wellknown in the art (see, e.g., Karlin and Altschul, 1990, Proc. Natl.Acad. Sci. USA 87:2267-2268; Altschul et al., 1990, J. Mol. Biol.215:403-410; Altschul et al., 1993, Nature Genetics 3:266-272; Altschulet al., 1997, Nuc. Acids Res. 25:3389-3402). In particular, fivespecific BLAST programs are used to perform the following task:

(1) BLASTP and BLAST3 compare an amino acid query sequence against aprotein sequence database;

(2) BLASTN compares a nucleotide query sequence against a nucleotidesequence database;

(3) BLASTX compares the six-frame conceptual translation products of aquery nucleotide sequence (both strands) against a protein sequencedatabase;

(4) TBLASTN compares a query protein sequence against a nucleotidesequence database translated in all six reading frames (both strands);and

(5) TBLASTX compares the six-frame translations of a nucleotide querysequence against the six-frame translations of a nucleotide sequencedatabase.

The BLAST programs identify homologous sequences by identifying similarsegments, which are referred to herein as “high-scoring segment pairs,”between a query amino or nucleic acid sequence and a test sequence whichis preferably obtained from a protein or nucleic acid sequence database.High-scoring segment pairs are preferably identified (i.e., aligned) bymeans of a scoring matrix, many of which are known in the art.Preferably, the scoring matrix used is the BLOSUM62 matrix (Gonnet etal., 1992, Science 256:1443-1445; Henikoff and Henikoff, 1993, Proteins17:49-61). Less preferably, the PAM or PAM250 matrices may also be used(see, e.g., Schwartz and Dayhoff, eds., 1978, Matrices for DetectingDistance Relationships: Atlas of Protein Sequence and Structure,Washington: National Biomedical Research Foundation). The BLAST programsevaluate the statistical significance of all high-scoring segment pairsidentified, and preferably selects those segments which satisfy auser-specified threshold of significance, such as a user-specifiedpercent homology. Preferably, the statistical significance of ahigh-scoring segment pair is evaluated using the statisticalsignificance formula of Karlin (see, e.g., Karlin and Altschul, 1990,Proc. Natl. Acad. Sci. USA 87:2267-2268).

Stringent Hybridization Conditions

By way of example and not limitation, procedures using conditions ofhigh stringency are as follows: Prehybridization of filters containingDNA is carried out for 8 h to overnight at 65° C. in buffer composed of6×SSC, 50 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.02% PVP, 0.02% Ficoll,0.02% BSA, and 500 μg/ml denatured salmon sperm DNA. Filters arehybridized for 48 h at 65° C., the preferred hybridization temperature,in prehybridization mixture containing 100 μg/ml denatured salmon spermDNA and 5-20×10⁶ cpm of ³²P-labeled probe. Alternatively, thehybridization step can be performed at 65° C. in the presence of SSCbuffer, 1×SSC corresponding to 0.15M NaCl and 0.05 M Na citrate.Subsequently, filter washes can be done at 37° C. for 1 h in a solutioncontaining 2×SSC, 0.01% PVP, 0.01% Ficoll, and 0.01% BSA, followed by awash in 0.1×SSC at 50° C. for 45 min. Alternatively, filter washes canbe performed in a solution containing 2×SSC and 0.1% SDS, or 0.5×SSC and0.1% SDS, or 0.1×SSC and 0.1% SDS at 68° C. for 15 minute intervals.Following the wash steps, the hybridized probes are detectable byautoradiography. Other conditions of high stringency which may be usedare well known in the art and as cited in Sambrook et al., 1989; andAusubel et al., 1989, are incorporated herein in their entirety. Thesehybridization conditions are suitable for a nucleic acid molecule ofabout 20 nucleotides in length. The hybridization conditions describedabove are adapted according to the length of the desired nucleic acid,following techniques well known to the one skilled in the art.Hybridization conditions may, for example, be adapted according to theteachings disclosed in the book of Hames and Higgins (1985) or inSambrook et al. (1989).

Genomic Sequences of APM 1

The present invention concerns the genomic sequence of APM1. The presentinvention encompasses polynucleotides, APM1 genes, or APM1 genomicsequences consisting of, consisting essentially of, or comprising thesequence of SEQ ID No 1, a sequence complementary thereto, as well asfragments and variants thereof. These polynucleotides may be purified,isolated, or recombinant. This genomic sequence of APM1 has beenlocalized on locus 3p27 by FISH.

The invention also encompasses purified, isolated, or recombinantpolynucleotides comprising a nucleotide sequence having at least 70, 75,80, 85, 90, or 95% nucleotide identity with a nucleotide sequence of SEQID No 1 or a complementary sequence thereto or a fragment thereof. Thenucleotide differences as regards to the nucleotide sequence of SEQ IDNo 1 may be generally randomly distributed throughout the entire nucleicacid. Nevertheless, preferred nucleic acids are those wherein thenucleotide differences as regards to the nucleotide sequence of SEQ IDNo 1 are predominantly located outside the coding sequences contained inthe exons. These nucleic acids, as well as their fragments and variants,may be used as oligonucleotide primers or probes in order to detect thepresence of a copy of the APM1 gene in a test sample, or alternativelyin order to amplify a target nucleotide sequence within the APM1sequences.

Another object of the invention consists of a purified, isolated, orrecombinant nucleic acids that hybridizes with the nucleotide sequenceof SEQ ID No 1 or a complementary sequence thereto or a variant thereof,under the stringent hybridization conditions as defined above.

Particularly preferred nucleic acids of the invention include isolated,purified, or recombinant polynucleotides comprising a contiguous span ofat least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150,200, 500, or 1000 nucleotides of SEQ ID No: 1 or the complementsthereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or10 of the following nucleotide positions of SEQ ID No 1: 1 to 3528, 4852to 15143, 15366 to 16276, and 20560 to 20966. Other preferred nucleicacids of the invention include isolated, purified, or recombinantpolynucleotides comprising a contiguous span of at least 12, 15, 18, 20,25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000nucleotides of SEQ ID No: 1 or the complements thereof, wherein saidcontiguous span comprises positions 4150 to 4154, or 17169 to 17170 ofSEQ ID No: 1. Additional preferred nucleic acids of the inventioninclude isolated, purified, or recombinant polynucleotides comprising acontiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70,80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No: 1 or thecomplements thereof, wherein said contiguous span comprises a G atposition 3787, a G at position 3809, a T at position 4311, an A atposition 4328, an A at position 4683, or an A at position 15319 of SEQID No: 1. Additional preferred nucleic acids of the invention includeisolated, purified, or recombinant polynucleotides comprising acontiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70,80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No: 1 or thecomplements thereof, wherein said contiguous span comprises a G atposition 15196, a deletion of an A at position 17170, a G at position17829, an A at position 18011, and a T at position 18489. It should benoted that nucleic acid fragments of any size and sequence may also becomprised by the polynucleotides described in this section. Otherparticularly preferred nucleic acids of the invention include isolated,purified, or recombinant polynucleotides comprising a contiguous span ofat least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150,200, 500, or 1000 nucleotides of SEQ ID No: 1 or the complementsthereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or10 of nucleotide positions 1 to 4833 of SEQ ID No 1.

The APM1 genomic nucleic acid comprises 3 exons. Exon 1 starts at thenucleotide in position 4812 and ends at the nucleotide in position 4851of the nucleotide sequence of SEQ ID No 1; Exon 2 starts at thenucleotide in position 15144 and ends at the nucleotide in position15365 of the nucleotide sequence of SEQ ID No 1; Exon 3 starts at thenucleotide in position 16277 and ends at the nucleotide in position20559 of the nucleotide sequence of SEQ ID No 1. Thus, the inventionembodies purified, isolated, or recombinant polynucleotides comprising anucleotide sequence selected from the group consisting of the threeexons of the APM1 gene, or a sequence complementary thereto. Theinvention also deals with purified, isolated, or recombinant nucleicacids comprising a combination of at least two exons of the APM1 gene,wherein the polynucleotides are arranged within the nucleic acid, fromthe 5′-end to the 3′-end of said nucleic acid, in the same order as inSEQ ID No 1.

Intron 1 (nucleotide sequence located between Exon 1 and Exon 2) startsat the nucleotide in position 4852 of the nucleotide sequence of SEQ IDNo 1 and ends at the nucleotide in position 15143 of the nucleotidesequence of SEQ ID No 1; Intron 2 (nucleotide sequence located betweenExon 2 and Exon 3) starts at the nucleotide in position 15366 and endsat the nucleotide in position 16276 of the nucleotide sequence of SEQ IDNo 1. Thus, the invention embodies purified, isolated, or recombinantpolynucleotides comprising a nucleotide sequence selected from the groupconsisting of Intron 1 and Intron 2 of the APM1 gene, and sequencescomplementary thereto.

While this section is entitled “Genomic Sequences of APM1,” it should benoted that nucleic acid fragments of any size and sequence may also becomprised by the polynucleotides described in this section, includingthose flanking the genomic sequences of APM1 on either side and/orbetween two or more such genomic sequences.

APM1 cDNA Sequences

The expression of the APM1 gene has been shown to lead to the productionof at least one mRNA species with the nucleic acid sequence set forth inSEQ ID No 5.

Another object of the invention is a purified, isolated, or recombinantnucleic acid comprising the nucleotide sequence of SEQ ID No 5,complementary sequences thereto, as well as allelic variants, andfragments thereof. Moreover, preferred polynucleotides of the inventioninclude purified, isolated, or recombinant APM1 cDNAs consisting of,consisting essentially of, or comprising the sequence of SEQ ID No: 5.Particularly preferred embodiments of the invention include isolated,purified, or recombinant polynucleotides comprising a contiguous span ofat least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150,200, 500, or 1000 nucleotides of SEQ ID No: 5 or the complementsthereof, wherein said contiguous span comprises positions selected fromthe group consisting of a nucleotide T at the position 93, positions1154-1157, a nucleotide G at the position 1997, positions 2083-2086, anucleotide C at the position 2367, 2456, 2467, 2475, or 2631, annucleotide A at the position 2778, positions 2785-2788, positions2797-2801, a nucleotide T at the position 3594, a nucleotide G at theposition 3684, positions 3697-3701, positions 4026-4027, a nucleotide Tat the position 4053, 4078, 4533 or 4536 of SEQ ID No 5. Alternativepreferred embodiments of the invention include isolated, purified, orrecombinant polynucleotides comprising a contiguous span of at least 12,15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or1000 nucleotides of SEQ ID No: 5 or the complements thereof, whereinsaid contiguous span comprises positions selected from the groupconsisting of a nucleotide G at the position 93, a nucleotide G at theposition 1815, a nucleotide A at the position 1997, and a nucleotide Tat position 2475 of SEQ ID NO:5. Additional particularly preferredembodiments of the invention include isolated, purified, or recombinantpolynucleotides comprising a contiguous span of at least 12, 15, 18, 20,25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000nucleotides of SEQ ID No: 5 or the complements thereof, wherein saidcontiguous span comprises at least 1, 2, 3, 5, or 10 of nucleotidepositions 1 to 22 of SEQ ID No 5.

The cDNA of SEQ ID No 5 includes a 5′-UTR region starting from thenucleotide at position 1 and ending at the nucleotide in position 48 ofSEQ ID No 5. The cDNA of SEQ ID No 5 includes a 3′-UTR region startingfrom the nucleotide at position 785 and ending at the nucleotide atposition 4545 of SEQ ID No 5. At least two polyadenylation sites arepresent at position 2937 to 2942 and position 4525 to 4530 of SEQ ID No5.

Consequently, the invention concerns a purified, isolated, andrecombinant nucleic acids comprising a nucleotide sequence of the 5′UTRof the APM1 cDNA, a sequence complementary thereto, or an allelicvariant thereof.

The sequence at the 5′-end of this cDNA, more particularly thenucleotide sequence comprising 1 to 367 of SEQ ID No 5, corresponds tothe nucleotide sequence of a 5′-EST that was obtained from a humandystrophic muscle cDNA library, and characterized following theteachings of the PCT Application No WO 96/34981 and of the U.S. patentapplication Ser. No. 08/905,134 filed on Aug. 1, 1997. Polynucleotidescomprising this 5′-EST are also part of the invention. This 5′ EST isset forth in SEQ ID No 4.

While this section is entitled “APM1 cDNA Sequences,” it should be notedthat nucleic acid fragments of any size and sequence may also becomprised by the polynucleotides described in this section, includingthose flanking the genomic sequences of APM1 and/or between two or moresuch genomic sequences.

Regulatory Sequences of APM1

The genomic sequence of the APM1 gene contains regulatory sequences bothin the non-coding 5′-flanking region and in the non-coding 3′-flankingregion that border the APM1 coding region containing the three exons ofthis gene, as well as in the introns.

The 5′-regulatory sequence of the APM1 gene comprises the nucleotidesequence of SEQ ID No 2, and from 1 to 4811 of SEQ ID No 1. Thispolynucleotide contains the promoter site.

The 3′-regulatory sequence of the APM1 gene comprises the nucleotidesequence of SEQ ID No 3, and from 20560 to 20966 of SEQ ID No 1.

Polynucleotides derived from the SEQ ID Nos 2 or 3 are useful in orderto detect the presence of at least a copy of a nucleotide sequence ofSEQ ID No 1 or a fragment thereof in a test sample. They are also usefulto express APM1 or a heterologous protein in cells.

The promoter activity of the regulatory regions of APM1 can be assessedas described below.

Methods to identify relevant biologically active polynucleotidefragments or variants of SEQ ID Nos 2 and 3, are known to one with skillin the art, and exemplary methods are described in Sambrook et al.(Sambrook, 1989). For example, the presence of a promoter (or otherregulatory sequences) in test sequences can be determined by splicingthe test sequences (fragments or variants of SEQ ID Nos 2 and 3, forexample) into a recombinant vector carrying a marker gene (i.e. betagalactosidase, chloramphenicol acetyl transferase, etc.) that isexpressed only if a promoter (or other regulatory sequences) is presentin the test sequences. Genomic sequences located upstream of the firstexon of the APM1 gene can be cloned into a suitable promoter reportervector, such as the pSEAP-Basic, pSEAP-Enhancer, pβgal-Basic,pβgal-Enhancer, or pEGFP-1 Promoter Reporter vectors available fromClontech, or pGL2-basic or pGL3-basic promoterless luciferase reportergene vector from Promega. Briefly, each of these promoter reportervectors include multiple cloning sites positioned upstream of a reportergene encoding a readily assayable protein such as secreted alkalinephosphatase, luciferase, β galactosidase, or green fluorescent protein.The sequences upstream from the APM1 coding region are inserted into thecloning sites upstream of the reporter gene in both orientations andintroduced into an appropriate host cell. The level of reporter proteinis assayed and compared to the level obtained from a vector which lacksan insert in the cloning site. The presence of an elevated expressionlevel in the vector containing the insert with respect to the controlvector indicates the presence of a promoter in the insert. If necessary,the upstream sequences can be cloned into vectors which contain anenhancer for increasing transcription levels from weak promotersequences. A significant level of expression above that observed withthe vector lacking an insert indicates that a promoter sequence ispresent in the inserted upstream sequence.

A promoter sequence within the upstream genomic DNA may be furtherdefined by constructing nested 5′ and/or 3′ deletions in the upstreamDNA using conventional techniques such as Exonuclease III or appropriaterestriction endonuclease digestion. The resulting deletion fragments canbe inserted into the promoter reporter vector to determine whether thedeletion has reduced or obliterated promoter activity, such asdescribed, for example, by Coles et al. (1998). In this way, theboundaries of the promoters may be defined. If desired, potentialindividual regulatory sites within the promoter may be identified usingsite directed mutagenesis or linker scanning to obliterate potentialtranscription factor binding sites within the promoter individually orin combination. The effects of these mutations on transcription levelsmay be determined by inserting the mutations into cloning sites inpromoter reporter vectors. This type of assay is well-known to thoseskilled in the art and is described in WO 97/17359, U.S. Pat. No.5,374,544, EP 582 796, U.S. Pat. No. 5,698,389, U.S. Pat. No. 5,643,746,U.S. Pat. No. 5,502,176, and U.S. Pat. No. 5,266,488, incorporatedherein by reference in their entirety including any drawings, figures,or tables.

The strength and the specificity of the promoter of the APM1 gene can beassessed through the expression levels of a detectable polynucleotideoperably linked to the APM1 promoter in different types of cells andtissues. The detectable polynucleotide may be either a polynucleotidethat specifically hybridizes with a predefined oligonucleotide probe, ora polynucleotide encoding a detectable protein, including a APM1polypeptide or a fragment or a variant thereof. This type of assay iswell-known to those skilled in the art and is described in U.S. Pat. No.5,502,176, and U.S. Pat. No. 5,266,488, incorporated herein by referencein their entirety including any drawings, figures, or tables. Inaddition, some of the methods are discussed in more detail below.

Polynucleotides carrying the regulatory elements located at the 5′ endand at the 3′ end of the APM1 coding region may be advantageously usedto control the transcriptional and translational activity of anheterologous polynucleotide of interest.

Thus, the present invention also concerns a purified or isolated nucleicacid comprising a polynucleotide which is selected from the groupconsisting of the nucleotide sequences of SEQ ID Nos 2 and 3, or asequence complementary thereto or a biologically active fragment orvariant thereof.

The invention also pertains to a purified or isolated nucleic acidcomprising a polynucleotide having at least 95% nucleotide identity witha polynucleotide selected from the group consisting of the nucleotidesequences of SEQ ID Nos 2 and 3, advantageously 99% nucleotide identity,preferably 99.5% nucleotide identity and most preferably 99.8%nucleotide identity with a polynucleotide selected from the groupconsisting of the nucleotide sequences of SEQ ID Nos 2 and 3, or asequence complementary thereto or a variant thereof or a biologicallyactive fragment thereof.

Another object of the invention consists of purified, isolated orrecombinant nucleic acids comprising a polynucleotide that hybridizes,under the stringent hybridization conditions defined herein, with apolynucleotide selected from the group consisting of the nucleotidesequences of SEQ ID Nos 2 and 3, or a sequence complementary thereto ora variant thereof or a biologically active fragment thereof.

Preferred fragments of the nucleic acid of SEQ ID No 2 have a length ofabout 1500 or 1000 nucleotides, preferably of about 500 nucleotides,more preferably about 400 nucleotides, even more preferably 300nucleotides and most preferably about 200 nucleotides. Preferably thefragments of SEQ ID No 2 are within positions 1 to 3528.

Preferred fragments of the nucleic acid of SEQ ID No 3 are at least 50,100, 150, 200, 300 or 400 bases in length.

By “biologically active” polynucleotide derivatives of SEQ ID Nos 1, 2and 3 are polynucleotides comprising or alternatively consisting of afragment of said polynucleotide which is functional as a regulatoryregion for expressing a recombinant polypeptide or a recombinantpolynucleotide in a recombinant cell host. It could act either as anenhancer or as a repressor.

For the purpose of the invention, a nucleic acid or polynucleotide is“functional” as a regulatory region for expressing a recombinantpolypeptide or a recombinant polynucleotide if said regulatorypolynucleotide contains nucleotide sequences which containtranscriptional and translational regulatory information, and suchsequences are “operably linked” to nucleotide sequences which encode thedesired polypeptide or the desired polynucleotide.

The regulatory polynucleotides of the invention may be prepared from anyof the nucleotide sequence of SEQ ID Nos 1, 2, and 3 by cleavage usingsuitable restriction enzymes, as described for example in Sambrook etal. (1989).

The regulatory polynucleotides may also be prepared by digestion of anyof SEQ ID Nos 1, 2, and 3 by an exonuclease enzyme, such as Bal31(Wabiko et al., 1986).

These regulatory polynucleotides can also be prepared by nucleic acidchemical synthesis, as described elsewhere in the specification.

The regulatory polynucleotides according to the invention may be part ofa recombinant expression vector that may be used to express a codingsequence in a desired host cell or host organism. The recombinantexpression vectors according to the invention are described elsewhere inthe specification.

A preferred 5′-regulatory polynucleotide of the invention includes the5′-untranslated region (5′-UTR) of the APM1 cDNA, or a biologicallyactive fragment or variant thereof.

A preferred 3′-regulatory polynucleotide of the invention includes the3′-untranslated region (3′-UTR) of the APM1 cDNA, or a biologicallyactive fragment or variant thereof.

A further object of the invention consists of a purified or isolatednucleic acid comprising:

-   -   a) a nucleic acid comprising a regulatory nucleotide sequence        selected from the group consisting of:        -   (i) a nucleotide sequence comprising a polynucleotide of SEQ            ID No 2 or a complementary sequence thereto;        -   (ii) a nucleotide sequence comprising a polynucleotide            having at least 95% of nucleotide identity with the            nucleotide sequence of SEQ ID No 2 or a complementary            sequence thereto;        -   (iii) a nucleotide sequence comprising a polynucleotide that            hybridizes under stringent hybridization conditions with the            nucleotide sequence of SEQ ID No 2 or a complementary            sequence thereto; and        -   (iv) a biologically active fragment or variant of the            polynucleotides in (i), (ii) and (iii);    -   b) a polynucleotide encoding a desired polypeptide or a nucleic        acid of interest, operably linked to the nucleic acid defined        in (a) above;    -   c) Optionally, a nucleic acid comprising a 3′-regulatory        polynucleotide, preferably a 3′-regulatory polynucleotide of the        APM1 gene.

In a specific embodiment of the nucleic acid defined above, said nucleicacid includes the 5′-untranslated region (5′-UTR) of the APM1 cDNA, or abiologically active fragment or variant thereof.

In a second specific embodiment of the nucleic acid defined above, saidnucleic acid includes the 3′-untranslated region (3′-UTR) of the APM1cDNA, or a biologically active fragment or variant thereof.

The regulatory polynucleotide of SEQ ID No 2, and its biologicallyactive fragments or variants, is operably linked at the 5′-end of thepolynucleotide encoding the desired polypeptide or polynucleotide.

The regulatory polynucleotide of SEQ ID No 3, or its biologically activefragments or variants, is advantageously operably linked at the 3′-endof the polynucleotide encoding the desired polypeptide orpolynucleotide.

The desired polypeptide encoded by the above-described nucleic acid maybe of various nature or origin, encompassing proteins of prokaryotic oreukaryotic origin. Among the polypeptides expressed under the control ofa APM1 regulatory region are included bacterial, fungal or viralantigens. Also encompassed are eukaryotic proteins including, but notlimited to, intracellular proteins such as “house keeping” proteins,membrane-bound proteins such as receptors, and secreted proteins such asendogenous mediators, for example cytokines. The desired polypeptide maybe the APM1 protein, especially the protein of the amino acid sequenceof SEQ ID No 6, or a fragment or a variant thereof.

The nucleic acids encoded by the above-described polynucleotide, usuallyan RNA molecule, may be complementary to a coding polynucleotide, forexample to the APM1 coding sequence, and thus useful as antisensepolynucleotides.

Such a polynucleotide may be included in a recombinant expression vectorin order to express the desired polypeptide or the desired nucleic acidin host cell or in a host organism. Suitable recombinant vectors thatcontain a polynucleotide such as described hereinbefore are disclosedelsewhere in the specification.

Coding Regions

The APM1 open reading frame is contained in the corresponding mRNA ofSEQ ID No 5. More precisely, the effective APM1 coding sequence (CDS)includes the region between nucleotide position 49 (first nucleotide ofthe ATG codon) and nucleotide position 783 (end nucleotide of the TGAcodon) of SEQ ID No 5. The present invention also embodies isolated,purified, and recombinant polynucleotides which encode polypeptidescomprising a contiguous span of at least 6 amino acids, preferably atleast 8 or 10 amino acids, more preferably at least 12, 15, 20, 25, 30,40, 50, or 100 amino acids of SEQ ID NO: 6, wherein said contiguous spanincludes a glutamic acid residue at amino acid position 56 in SEQ ID NO:6.

The coding sequence of APM1 may be expressed in a desired host cell or adesired host organism, when this polynucleotide is placed under thecontrol of suitable expression signals. The expression signals may beeither the expression signals contained in the regulatory regions in theAPM1 gene of the invention or may be exogenous regulatory nucleicsequences. Such a polynucleotide, when placed under suitable expressionsignals, may also be inserted in a vector for its expression and/oramplification.

Polynucleotide Constructs

The terms “polynucleotide construct” and “recombinant polynucleotide”are used interchangeably herein to refer to linear or circular, purifiedor isolated polynucleotides that have been artificially designed andwhich comprise at least two nucleotide sequences that are not found ascontiguous nucleotide sequences in their initial natural environment.

DNA Construct That Enables Directing Temporal and Spatial APM1 GeneExpression in Recombinant Cell Hosts and in Transgenic Animals.

In order to study the physiological and phenotypic consequences of alack of synthesis of the APM1 protein, both at the cell level and at themulti cellular organism level, the invention also encompasses DNAconstructs and recombinant vectors enabling a conditional expression ofa specific allele of the APM1 genomic sequence or cDNA and also of acopy of this genomic sequence or cDNA harboring substitutions,deletions, or additions of one or more bases as regards to the APM1nucleotide sequence of SEQ ID Nos 1 and 5, or a fragment thereof, thesebase substitutions, deletions or additions being located either in anexon, an intron or a regulatory sequence, but preferably in the5′-regulatory sequence or in an exon of the APM1 genomic sequence orwithin the APM1 cDNA of SEQ ID No 5. In a preferred embodiment, the APM1sequence comprises a biallelic marker of the present invention.

The present invention also embodies recombinant vectors comprised ofisolated, purified, or recombinant polynucleotides which encode apolypeptides comprising a contiguous span of at least 6 amino acids,preferably at least 8 or 10 amino acids, more preferably at least 12,15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID NO: 6, wherein saidcontiguous span includes a glutamic acid residue at amino acid position56 in SEQ ID NO: 6. Particularly preferred embodiments of the inventioninclude recombinant vectors comprised of isolated, purified, orrecombinant polynucleotides comprising a contiguous span of at least 12,15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or1000 nucleotides of SEQ ID No: 5 or the complements thereof, whereinsaid contiguous span comprises positions selected from the groupconsisting of a nucleotide T at the position 93, positions 1154-1157, anucleotide G at the position 1997, positions 2083-2086, a nucleotide Cat the position 2367, 2456, 2467, 2475, or 2631, an nucleotide A at theposition 2778, positions 2785-2788, positions 2797-2801, a nucleotide Tat the position 3594, a nucleotide G at the position 3684, positions3697-3701, positions 4026-4027, a nucleotide T at the position 4053,4078, 4533 or 4536 of SEQ ID No 5. Alternative preferred embodiments ofthe invention include isolated, purified, or recombinant polynucleotidescomprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40,50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ IDNo: 5 or the complements thereof, wherein said contiguous span comprisespositions selected from the group consisting of a nucleotide G at theposition 93, a nucleotide G at the position 1815, a nucleotide A at theposition 1997, and a nucleotide T at position 2475 of SEQ ID NO:5. Otherparticularly preferred embodiments of the invention include recombinantvectors comprised of isolated, purified, or recombinant polynucleotidescomprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40,50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ IDNo: 5 or the complements thereof, wherein said contiguous span comprisesat least 1, 2, 3, 5, or 10 of nucleotide positions 1 to 22 of SEQ ID No5. Such embodiments are particularly useful in expression vectors, andwhen stably transfected into host cells and animals. Particularlypreferred recombinant vectors of the invention are comprised ofisolated, purified, or recombinant polynucleotides comprising acontiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70,80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No: 1 or thecomplements thereof, wherein said contiguous span comprises at least 1,2, 3, 5, or 10 of the following nucleotide positions of SEQ ID No 1: 1to 3528, 4852 to 15143, 15366 to 16276, and 20560 to 20966. Otherpreferred recombinant vectors of the invention are comprised ofisolated, purified, or recombinant polynucleotides comprising acontiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70,80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No: 1 or thecomplements thereof, wherein said contiguous span comprises positions4150 to 4154, or 17169 to 17170 of SEQ ID No: 1. Additional preferredrecombinant vectors of the invention are comprised of isolated,purified, or recombinant polynucleotides comprising a contiguous span ofat least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150,200, 500, or 1000 nucleotides of SEQ ID No: 1 or the complementsthereof, wherein said contiguous span comprises a G at position 3787, aG at position 3809, a T at position 4311, an A at position 4328, an A atposition 4683, or an A at position 15319 of SEQ ID No: 1. Otherparticularly preferred recombinant vectors of the invention includeisolated, purified, or recombinant polynucleotides comprising acontiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70,80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No: I or thecomplements thereof, wherein said contiguous span comprises at least 1,2, 3, 5, or 10 of nucleotide positions 1 to 4833 of SEQ ID No 1.

A first preferred DNA construct is based on the tetracycline resistanceoperon tet from E. coli transposon Tn110 for controlling the APM1 geneexpression, such as described by Gossen et al. (1992, 1995) and Furth etal. (1994). Such a DNA construct contains seven tet operator sequencesfrom Tn10 (tetop) that are fused to either a minimal promoter or a5′-regulatory sequence of the APM1 gene, said minimal promoter or saidAPM1 regulatory sequence being operably linked to a polynucleotide ofinterest that codes either for a sense or an antisense oligonucleotideor for a polypeptide, including a APM1 polypeptide or a peptide fragmentthereof. This DNA construct is functional as a conditional expressionsystem for the nucleotide sequence of interest when the same cell alsocomprises a nucleotide sequence coding for either the wild type (tTA) orthe mutant (rTA) repressor fused to the activating domain of viralprotein VP16 of herpes simplex virus, placed under the control of apromoter, such as the HCMVIE1 enhancer/promoter or the MMTV-LTR. Indeed,a preferred DNA construct of the invention comprises both thepolynucleotide containing the tet operator sequences and thepolynucleotide containing a sequence coding for the tTA or the rTArepressor.

In a specific embodiment, the conditional expression DNA constructcontains the sequence encoding the mutant tetracycline repressor rTA,where the expression of the polynucleotide of interest is silent in theabsence of tetracycline and induced in its presence.

DNA Constructs Allowing Homologous Recombination: Replacement Vectors

A second preferred DNA construct will comprise, from 5′-end to 3′-end:(a) a first nucleotide sequence that is comprised in the APM1 genomicsequence; (b) a nucleotide sequence comprising a positive selectionmarker, such as the marker for neomycine resistance (neo); and (c) asecond nucleotide sequence that is comprised in the APM1 genomicsequence, and is located downstream from the first APM1 nucleotidesequence (a).

In a preferred embodiment, this DNA construct also comprises a negativeselection marker located upstream from the nucleotide sequence (a) ordownstream from the nucleotide sequence (c). Preferably, the negativeselection marker consists of the thymidine kinase (tk) gene (Thomas etal., 1986), the hygromycin beta gene (Te Riele et al., 1990), the hprtgene (Van der Lugt et al., 1991; Reid et al., 1990) or the Diphteriatoxin A fragment (Dt-A) gene (Nada et al., 1993; Yagi et al.1990).Preferably, the positive selection marker is located within a 35G-054US06CON APM1 exon sequence so as to interrupt the sequence encodinga APM1 protein. These replacement vectors are described, for example, byThomas et al. (1986; 1987), Mansour et al. (1988) and Koller et al.(1992).

The first and second nucleotide sequences (a) and (c) may be locatedwithin a APM1 regulatory sequence, an intronic sequence, an exonsequence or a sequence containing both regulatory and/or intronic and/orexon sequences. The size of the nucleotide sequences (a) and (c) rangesfrom 1 to 50 kb, preferably from 1 to 10 kb, more preferably from 2 to 6kb and most preferably from 2 to 4 kb. DNA Constructs AllowingHomologous Recombination

The present invention also encompasses primary, secondary, andimmortalized homologously recombinant host cells of vertebrate origin,preferably mammalian origin and particularly human origin, that havebeen engineered to: a) insert exogenous (heterologous) polynucleotidesinto the endogenous chromosomal DNA of a targeted gene, b) deleteendogenous chromosomal DNA, and/or c) replace endogenous chromosomal DNAwith exogenous polynucleotides. Insertions, deletions, and/orreplacements of polynucleotide sequences may be to the coding sequencesof the targeted gene and/or to regulatory regions, such as promoter andenhancer sequences, operably associated with the targeted gene.

The present invention further relates to a method of making ahomologously recombinant host cell in vitro or in vivo, wherein theexpression of a targeted gene not normally expressed in the cell isaltered. Preferably the alteration causes expression of the targetedgene under normal growth conditions or under conditions suitable forproducing the polypeptide encoded by the targeted gene. The methodcomprises the steps of: (a) transfecting the cell in vitro or in vivowith a polynucleotide construct, the a polynucleotide constructcomprising; (i) a targeting sequence; (ii) a regulatory sequence and/ora coding sequence; and (iii) an unpaired splice donor site, ifnecessary, thereby producing a transfected cell; and (b) maintaining thetransfected cell in vitro or in vivo under conditions appropriate forhomologous recombination.

The present invention further relates to a method of altering theexpression of a targeted gene in a cell in vitro or in vivo wherein thegene is not normally expressed in the cell, comprising the steps of: (a)transfecting the cell in vitro or in vivo with a a polynucleotideconstruct, the a polynucleotide construct comprising: (i) a targetingsequence; (ii) a regulatory sequence and/or a coding sequence; and (iii)an unpaired splice donor site, if necessary, thereby producing atransfected cell; and (b) maintaining the transfected cell in vitro orin vivo under conditions appropriate for homologous recombination,thereby producing a homologously recombinant cell; and (c) maintainingthe homologously recombinant cell in vitro or in vivo under conditionsappropriate for expression of the gene.

The present invention further relates to a method of making apolypeptide of the present invention by altering the expression of atargeted endogenous gene in a cell in vitro or in vivo wherein the geneis not normally expressed in the cell, comprising the steps of: a)transfecting the cell in vitro with a a polynucleotide construct, the apolynucleotide construct comprising: (i) a targeting sequence; (ii) aregulatory sequence and/or a coding sequence; and (iii) an unpairedsplice donor site, if necessary, thereby producing a transfected cell;(b) maintaining the transfected cell in vitro or in vivo underconditions appropriate for homologous recombination, thereby producing ahomologously recombinant cell; and c) maintaining the homologouslyrecombinant cell in vitro or in vivo under conditions appropriate forexpression of the gene thereby making the polypeptide.

The present invention further relates to a a polynucleotide constructwhich alters the expression of a targeted gene in a cell type in whichthe gene is not normally expressed. This occurs when a polynucleotideconstruct is inserted into the chromosomal DNA of the target cell,wherein the a polynucleotide construct comprises: a) a targetingsequence; b) a regulatory sequence and/or coding sequence; and c) anunpaired splice-donor site, if necessary. Further included are apolynucleotide constructs, as described above, wherein the constructfurther comprises a polynucleotide which encodes a polypeptide and isin-frame with the targeted endogenous gene after homologousrecombination with chromosomal DNA.

The compositions may be produced, and methods performed, by techniquesknown in the art, such as those described in U.S. Pat. Nos. 6,054,288;6,048,729; 6,048,724; 6,048,524; 5,994,127; 5,968,502; 5,965,125;5,869,239; 5,817,789; 5,783,385; 5,733,761; 5,641,670; 5,580,734;International Publication Nos: WO96/29411, WO 94/12650; and scientificarticles including 1994; Koller et al., Proc. Natl. Acad. Sci. USA86:8932-8935 (1989) (the disclosures of each of which are incorporatedby reference in their entireties).

DNA Constructs Allowing Homologous Recombination: Cre-LoxP System.

These new DNA constructs make use of the site specific recombinationsystem of the P1 phage. The P1 phage possesses a recombinase called Crewhich interacts specifically with a 34 base pairs loxP site. The loxPsite is composed of two palindromic sequences of 13 bp separated by a 8bp conserved sequence (Hoess et al., 1986). The recombination by the Creenzyme between two loxP sites having an identical orientation leads tothe deletion of the DNA fragment.

The Cre-loxP system used in combination with a homologous recombinationtechnique has been described by Gu et al. (1993, 1994). Briefly, anucleotide sequence of interest to be inserted in a targeted location ofthe genome harbors at least two loxP sites in the same orientation andlocated at the respective ends of a nucleotide sequence to be excisedfrom the recombinant genome. The excision event requires the presence ofthe recombinase (Cre) enzyme within the nucleus of the recombinant cellhost. The recombinase enzyme may be provided by (a) incubating therecombinant cell hosts in a culture medium containing this enzyme, byinjecting the Cre enzyme directly into the desired cell, such asdescribed by Araki et al. (1995), or by lipofection of the enzyme intothe cells, such as described by Baubonis et al. (1993); (b) transfectingthe cell host with a vector comprising the Cre coding sequence operablylinked to a promoter functional in the recombinant cell host, whichpromoter being optionally inducible, said vector being introduced in therecombinant cell host, such as described by Gu et al. (1993) and Saueret al. (1988); (c) introducing in the genome of the cell host apolynucleotide comprising the Cre coding sequence operably linked to apromoter functional in the recombinant cell host, which promoter isoptionally inducible, and said polynucleotide being inserted in thegenome of the cell host either by a random insertion event or anhomologous recombination event, such as described by Gu et al. (1994).

In a specific embodiment, the vector containing the sequence to beinserted in the APM1 gene by homologous recombination is constructed insuch a way that selectable markers are flanked by loxP sites in the sameorientation. The selectable markers can be removed while leaving theAPM1 sequences of interest that have been inserted by an homologousrecombination event using the Cre enzyme. Again, two selectable markersare needed : a positive selection marker to select for the recombinationevent and a negative selection marker to select for the homologousrecombination event. Vectors and methods using the Cre-loxP system aredescribed by Zou et al. (1994).

Thus, a third preferred DNA construct of the invention comprises, from5′-end to 3′-end: (a) a first nucleotide sequence that is comprised inthe APM1 genomic sequence; (b) a nucleotide sequence comprising apolynucleotide encoding a positive selection marker, said nucleotidesequence comprising additionally two sequences defining a siterecognized by a recombinase, such as a loxP site, the two sites beingplaced in the same orientation; and (c) a second nucleotide sequencethat is comprised in the APM1 genomic sequence, and is located on thegenome downstream of the first APM1 nucleotide sequence (a).

The sequences defining a site recognized by a recombinase, such as aloxP site, are preferably located within the nucleotide sequence (b) atsuitable locations bordering the nucleotide sequence for which theconditional excision is sought. In one specific embodiment, two loxPsites are located at each side of the positive selection marker sequencein order to allow its excision at a desired time after the occurrence ofthe homologous recombination event.

In a preferred embodiment of a method using the third DNA constructdescribed above, the excision of the polynucleotide fragment bordered bythe two sites recognized by a recombinase, preferably two loxP sites, isperformed at a desired time, due to the presence within the genome ofthe recombinant host cell of a sequence encoding the Cre enzyme operablylinked to a promoter sequence, preferably an inducible promoter, morepreferably a tissue-specific promoter sequence, and most preferably apromoter sequence which is both inducible and tissue-specific, such asdescribed by Gu et al. (1994).

The presence of the Cre enzyme within the genome of the recombinant cellhost may be the result of the breeding of two transgenic animals, thefirst transgenic animal bearing the APM1-derived sequence of interestcontaining the loxP sites as described above and the second transgenicanimal bearing the Cre coding sequence operably linked to a suitablepromoter sequence, such as described by Gu et al. (1994).

Spatio-temporal control of the Cre enzyme expression may also beachieved with an adenovirus based vector that contains the Cre gene thusallowing infection of cells, or in vivo infection of organs, fordelivery of the Cre enzyme, such as described by Anton and Graham (1995)and Kanegae et al. (l995).

The DNA constructs described above may be used to introduce a desirednucleotide sequence of the invention, preferably a APM1 genomic sequenceor a APM1 cDNA sequence, and most preferably an altered copy of a APM1genomic or cDNA sequence, within a predetermined location of thetargeted genome, leading either to the generation of an altered copy ofa targeted gene (knock-out homologous recombination) or to thereplacement of a copy of the targeted gene by another copy sufficientlyhomologous to allow an homologous recombination event to occur (knock-inhomologous recombination). In a specific embodiment, the DNA constructsdescribed above may be used to introduce a APM1 genomic sequence or aAPM1 cDNA sequence comprising at least one biallelic marker of thepresent invention, preferably at least one biallelic marker selectedfrom the group consisting of A1 to A26.

Nuclear Antisense DNA Constructs

Other compositions contain a vector of the invention comprising anoligonucleotide fragment of the nucleic sequence SEQ ID No 5, preferablya fragment including the start codon of the APM1 gene, as an antisensetool that inhibits the expression of the corresponding APM1 gene.Preferred methods using antisense polynucleotide according to thepresent invention are described by Sczakiel et al. (1995) or PCTApplication No WO 95/24223, hereby encorporated by reference herein intheir entirety including any drawings, figures, or tables.

Preferably, the antisense tools are chosen among the polynucleotides(15-200 bp long) that are complementary to the 5′end of the APM1 mRNA.In one embodiment, a combination of different antisense polynucleotidescomplementary to different parts of the desired targeted gene are used.

Preferred antisense polynucleotides of the invention are complementaryto mRNA sequences of APM1 containing either the translation initiationcodon ATG or a splice site. Further preferred antisense polynucleotidesare complementary to a splice site of APM1 mRNA.

Preferably, the antisense polynucleotides of the invention have a 3′polyadenylation signal that has been replaced with a self-cleavingribozyme sequence, such that RNA polymerase II transcripts are producedwithout poly(A) at their 3′ ends. These antisense polynucleotides areincapable of being exported from the nucleus (Liu et al. (1994)). In apreferred embodiment, these APM1 antisense polynucleotides alsocomprise, within the ribozyme cassette, a histone stem-loop structure tostabilize cleaved transcripts against 3′-5′ exonucleolytic degradation,such as the structure described by Eckner et al. (1991).

Oligonucleotide Probes And Primers

Polynucleotides derived from the APM1 gene are useful in order to detectthe presence of at least a copy of a nucleotide sequence of SEQ ID No 1,or a fragment, complement, or variant thereof in a test sample.

Primers and probes according to the invention consist of nucleic acidscomprising at least 12, 15, 18, 20, 25, 30, 40, 50, or 100 consecutivenucleotides of a nucleic acid selected from the group consisting of:

-   -   a) the nucleotide sequence beginning at the nucleotide in        position 1 and ending at the nucleotide in position 4811 of the        nucleotide sequence of SEQ ID No 1 or a variant thereof or a        sequence complementary thereto; more particularly, the        nucleotide sequence beginning at the nucleotide in position 1        and ending at the nucleotide in position 3528 of the nucleotide        sequence of SEQ ID No 1 or a variant thereof or a sequence        complementary thereto;    -   b) the nucleotide sequence beginning at the nucleotide in        position 4853 and ending at the nucleotide in position 15143 of        the nucleotide sequence of SEQ ID No 1 or a variant thereof or a        sequence complementary thereto;    -   c) the nucleotide sequence beginning at the nucleotide in        position 15366 and ending at the nucleotide in position 16276 of        the nucleotide sequence of SEQ ID No 1 or a variant thereof or a        sequence complementary thereto;    -   d) the nucleotide sequence beginning at the nucleotide in        position 20560 and ending at the nucleotide in position 20966 of        the nucleotide sequence of SEQ ID No 1 or a variant thereof or a        sequence complementary thereto;    -   e) the nucleotide sequence beginning at the nucleotide in        position 1 and ending at the nucleotide in position 22 of the        nucleotide sequence of SEQ ID No 5 or a variant thereof or a        sequence complementary thereto.

Thus, the invention also relates to nucleic acid probes characterized inthat they hybridize specifically, under the stringent hybridizationconditions defined above, with a nucleic acid selected from the groupconsisting of:

-   -   a) the nucleotide sequence beginning at the nucleotide in        position 1 and ending at the nucleotide in position 4811 of the        nucleotide sequence of SEQ ID No 1 or a variant thereof or a        sequence complementary thereto; more particularly, the        nucleotide sequence beginning at the nucleotide in position 1        and ending at the nucleotide in position 3528 of the nucleotide        sequence of SEQ ID No 1 or a variant thereof or a sequence        complementary thereto;    -   b) the nucleotide sequence beginning at the nucleotide in        position 4853 and ending at the nucleotide in position 15143 of        the nucleotide sequence of SEQ ID No 1 or a variant thereof or a        sequence complementary thereto;    -   c) the nucleotide sequence beginning at the nucleotide in        position 15366 and ending at the nucleotide in position 16276 of        the nucleotide sequence of SEQ ID No 1 or a variant thereof or a        sequence complementary thereto;    -   d) the nucleotide sequence beginning at the nucleotide in        position 20560 and ending at the nucleotide in position 20966 of        the nucleotide sequence of SEQ ID No 1 or a variant thereof or a        sequence complementary thereto;    -   e) the nucleotide sequence beginning at the nucleotide in        position 1 and ending at the nucleotide in position 22 of the        nucleotide sequence of SEQ ID No 5 or a variant thereof or a        sequence complementary thereto.

The formation of stable hybrids depends on the melting temperature (Tm)of the DNA. The Tm depends on the length of the primer or probe, theionic strength of the solution and the G+C content. The higher the G+Ccontent of the primer or probe, the higher is the melting temperaturebecause G:C pairs are held by three H bonds whereas A:T pairs have onlytwo. The GC content in the probes of the invention usually rangesbetween 10 and 75%, preferably between 35 and 60%, and more preferablybetween 40 and 55%.

A probe or a primer according to the invention has between 8 and 1000nucleotides in length, and ranges preferably at least 8, 10, 12, 15, 18or 20 to 25, 35, 40, 50, 60, 70, 80, 100, 250, 500 or 1000 contiguousnucleotides of the nucleotide sequence of SEQ ID Nos 1-3, or a variantthereof or a complementary sequence thereto, or is specified to be atleast 12, 15, 18, 20, 25, 35, 40, 50, 60, 70, 80, 100, 250, 500 or 1000contiguous nucleotides of the nucleotide sequence of SEQ ID Nos 1-3 or avariant thereof or a complementary sequence thereto. More particularly,the length of these probes can range from 8, 10, 15, 20, or 30 to 100nucleotides, preferably from 10 to 50, more preferably from 15 to 30nucleotides. Shorter probes tend to lack specificity for a targetnucleic acid sequence and generally require cooler temperatures to formsufficiently stable hybrid complexes with the template. Longer probesare expensive to produce and can sometimes self-hybridize to formhairpin structures. The appropriate length for primers and probes undera particular set of assay conditions may be empirically determined byone of skill in the art. A preferred probe or primer consists of anucleic acid comprising a polynucleotide selected from the group ofnucleotide sequences consisting of B1 to B23, C1 to C24, D1 to D26 andE1 to E26.

Additionally, another preferred embodiment of a probe according to theinvention consists of a nucleic acid comprising a biallelic markerselected from the group consisting of A1 to A26 or the complementsthereto. Exemplary probes are given in Table 4 in the Examples.

Particularly preferred probes and primers of the invention includeisolated, purified, or recombinant polynucleotides comprising acontiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70,80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No: 1 or thecomplements thereof, wherein said contiguous span comprises at least 1,2, 3, 5, or 10 of the following nucleotide positions of SEQ ID No 1: 1to 3528, 4852 to 15143, 15366 to 16276, and 20560 to 20966. Otherpreferred primers and probes of the invention include isolated,purified, or recombinant polynucleotides comprising a contiguous span ofat least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150,200, 500, or 1000 nucleotides of SEQ ID No: 1 or the complementsthereof, wherein said contiguous span comprises positions 4150 to 4154,or 17169 to 17170 of SEQ ID No: 1. Additional preferred primers andprobes of the invention include isolated, purified, or recombinantpolynucleotides comprising a contiguous span of at least 12, 15, 18, 20,25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000nucleotides of SEQ ID No: 1 or the complements thereof, wherein saidcontiguous span comprises a G at position 3787, a G at position 3809, aT at position 4311, an A at position 4328, an A at position 4683, or anA at position 15319 of SEQ ID No: 1. Additional preferred primers andprobes of the invention include isolated, purified, or recombinantpolynucleotides comprising a contiguous span of at least 12, 15, 18, 20,25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000nucleotides of SEQ ID No: 1 or the complements thereof, wherein saidcontiguous span comprises a G at position 15196, a deletion of an A atposition 17170, a G at position 17829, an A at position 18011, and a Tat position 18489. Other particularly preferred primers and probes ofthe invention include isolated, purified, or recombinant polynucleotidescomprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40,50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ IDNo: 1 or the complements thereof, wherein said contiguous span comprisesat least 1, 2, 3, 5, or 10 of nucleotide positions 1 to 4833 of SEQ IDNo 1.

Another object of the invention is a purified, isolated, or recombinantprimers and probes comprising the nucleotide sequence of SEQ ID No 5,complementary sequences thereto, as well as allelic variants, andfragments thereof. Moreover, preferred primers and probes of theinvention include purified, isolated, or recombinant APM1 cDNAsconsisting of, consisting essentially of, or comprising the sequence ofSEQ ID No: 5. Particularly preferred embodiments of the inventioninclude isolated, purified, or recombinant polynucleotides comprising acontiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70,80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No: 5 or thecomplements thereof, wherein said contiguous span comprises positionsselected from the group consisting of a nucleotide T at the position 93,positions 1154-1157, a nucleotide G at the position 1997, positions2083-2086, a nucleotide C at the position 2367, 2456, 2467, 2475, or2631, an nucleotide A at the position 2778, positions 2785-2788,positions 2797-2801, a nucleotide T at the position 3594, a nucleotide Gat the position 3684, positions 3697-3701, positions 4026-4027, anucleotide T at the position 4053, 4078, 4533 or 4536 of SEQ ID No 5.Alternative preferred embodiments of the invention include isolated,purified, or recombinant polynucleotides comprising a contiguous span ofat least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150,200, 500, or 1000 nucleotides of SEQ ID No: 5 or the complementsthereof, wherein said contiguous span comprises positions selected fromthe group consisting of a nucleotide G at the position 93, a nucleotideG at the position 1815, a nucleotide A at the position 1997, and anucleotide T at position 2475 of SEQ ID NO:5. Other particularlypreferred embodiments of the invention include isolated, purified, orrecombinant polynucleotides comprising a contiguous span of at least 12,15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or1000 nucleotides of SEQ ID No: 5 or the complements thereof, whereinsaid contiguous span comprises at least 1, 2, 3, 5, or 10 of nucleotidepositions 1 to 22 of SEQ ID No 5.

In one embodiment the invention encompasses isolated, purified, andrecombinant polynucleotides consisting of, or consisting essentially ofa contiguous span of 8 to 50 nucleotides of SEQ ID 1 and the complementthereof, wherein said span includes an APM1-related biallelic marker insaid sequence; optionally, wherein said APM1-related biallelic marker isselected from the group consisting of A1, A2, A3, A5, A6, A7, A9, A10,A11, A12, A13, A14, A15, A16, A17, A18, A19, A20, A21, A22, A23, andA23; optionally, wherein said APM1-related biallelic marker is selectedfrom the group consisting of A4 A8, A24, A25 and A26; optionally,wherein said APM1-related biallelic marker is selected from the groupconsisting of A1, A2, and A7 or the group consisting of A4 and A8;optionally, wherein said contiguous span is 18 to 35 nucleotides inlength and said biallelic marker is within 4 nucleotides of the centerof said polynucleotide; optionally, wherein said polynucleotide consistsof said contiguous span and said contiguous span is 25 nucleotides inlength and said biallelic marker is at the center of saidpolynucleotide; optionally, wherein the 3′ end of said contiguous spanis present at the 3′ end of said polynucleotide; and optionally, whereinthe 3′ end of said contiguous span is located at the 3′ end of saidpolynucleotide and said biallelic marker is present at the 3′ end ofsaid polynucleotide.

In another embodiment the invention encompasses isolated, purified andrecombinant polynucleotides consisting of, or consisting essentially ofa contiguous span of 8 to 50 nucleotides of SEQ ID No: 1 or thecomplement thereof, wherein the 3′ end of said contiguous span islocated at the 3′ end of said polynucleotide, and wherein the 3′ end ofsaid polynucleotide is located within 20 nucleotides upstream of anAPM1-related biallelic marker in said sequence; optionally, wherein saidAPM1-related biallelic marker is selected from the group consisting ofA1, A2, A3, A5, A6, A7, A9, A10, A11, A12, A13, A14, A15, A16, A17, A18,A19, A20, A21, A22, and A23, or wherein said APM1-related biallelicmarker is selected from the group consisting of A4, A8, A24, A25 andA26; optionally, wherein said APM1-related biallelic marker is selectedfrom the group consisting of A1, A2, and A7 or the group consisting ofA4 and A8; optionally, wherein the 3′ end of said polynucleotide islocated 1 nucleotide upstream of said APM1-related biallelic marker insaid sequence; and optionally, wherein said polynucleotide consistsessentially of a sequence selected from the following sequences: D1, D2,D3, D4, D5, D6, D7, D8, D9, D10, D11, D12, D13, D14, D15, D16, D17, D18,D18, D19, D20, D21, D22, D23, D24, D25, D26, E1, E2, E3, E4, E5, E6, E7,E8, E9, E10, E11, E12, E13, E14, E15, E16, E17, E18, E19, E20, 21, E22,E23, E24, E25, and E26.

In a further embodiment, the invention encompasses isolated, purified,or recombinant polynucleotides consisting of, or consisting essentiallyof a sequence selected from the following sequences: B1, B2, B3, B4, B5,B6, B7, B8, B9, B10, B11, B12, B13, B14, B15, B16, B17, B18, B19, B 20,B21, B22, B23, C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, C11, C12, C13,C14, C15, C16, C17, C18, C19, C20, C21, C22, C23, and C24.

In an additional embodiment, the invention encompasses polynucleotidesfor use in hybridization assays, sequencing assays, and allele-specificamplification assays for determining the identity of the nucleotide atan APM1-related biallelic marker in SEQ ID No:1 or the complementthereof, as well as polynucleotides for use in amplifying segments ofnucleotides comprising an APM1-related biallelic marker in SEQ ID No:1or the complement thereof; optionally, wherein said APM1-relatedbiallelic marker is selected from the group consisting of A1, A2, A3,A5, A6, A7, A9, A10, A11, A12, A13, A14, A15, A16, A17, A18, A19, A20,A21, A22, and A23, or wherein said APM1-related biallelic marker isselected from the group consisting of A4, A8, A24, A25 and A26;optionally, wherein said APM1-related biallelic marker is selected fromthe group consisting of A1, A2, and A7 or the group consisting of A4 andA8.

The primers and probes can be prepared by any suitable method,including, for example, cloning and restriction of appropriate sequencesand direct chemical synthesis by a method such as the phosphodiestermethod of Narang et al. (1979), the phosphodiester method of Brown etal. (1979), the diethylphosphoramidite method of Beaucage et al. (1981)and the solid support method described in EP 0 707 592. The disclosuresof all these documents are incorporated herein by reference.

Detection probes are generally nucleic acid sequences or unchargednucleic acid analogs such as, for example peptide nucleic acids whichare disclosed in International Patent Application WO 92/20702, andmorpholino analogs which are described in U.S. Pat. Nos. 5,185,444,5,034,506 and 5,142,047. The probe may have to be rendered“non-extendable” in that additional dNTPs cannot be added to the probe.In and of themselves analogs usually are non-extendable and nucleic acidprobes can be rendered non-extendable by modifying the 3′ end of theprobe such that the hydroxyl group is no longer capable of participatingin elongation. For example, the 3′ end of the probe can befunctionalized with the capture or detection label to thereby consume orotherwise block the hydroxyl group. Alternatively, the 3′ hydroxyl groupcan be cleaved, replaced or modified. U.S. patent application Ser. No.07/049,061, filed Apr. 19, 1993, describes modifications, which can beused to render a probe non-extendable.

Any of the polynucleotides of the present invention can be labeled, ifdesired, by incorporating a label detectable by spectroscopic,photochemical, biochemical, immunochemical, or chemical means. Forexample, useful labels include radioactive substances (³²P, ³⁵S, ³H,125I), fluorescent dyes (5-bromodesoxyuridin, fluorescein,acetylaminofluorene, digoxigenin) and biotin. Preferably,polynucleotides are labeled at their 3′ and/or 5′ ends. Examples ofnon-radioactive labeling of nucleic acid fragments are described inFrench patent No. FR-7810975 or by Urdea et al (1988) orSanchez-Pescador et al (1988). In addition, the probes according to thepresent invention may have structural characteristics such that theyallow the signal amplification, such structural characteristics being,for example, branched DNA probes as those described by Urdea et al. in1991 or in European patent No. EP 0 225 807 (Chiron).

A label can also be used to capture the primer, so as to facilitate theimmobilization of either the primer or a primer extension product, suchas amplified DNA, on a solid support. A capture label is attached to theprimers or probes and can be a specific binding member which forms abinding pair with the solid phase reagent's specific binding member(e.g. biotin and streptavidin). Therefore, depending upon the type oflabel carried by a polynucleotide or a probe, it may be employed tocapture or to detect the target DNA. Further, it will be understood thatthe polynucleotides, primers or probes provided herein, may, themselves,serve as the capture label. For example, in the case where a solid phasereagent's binding member is a nucleic acid sequence, it may be selectedsuch that it binds a complementary portion of a primer or probe tothereby immobilize the primer or probe to the solid phase. In caseswhere a polynucleotide probe itself serves as the binding member, thoseskilled in the art will recognize that the probe will contain a sequenceor “tail” that is not complementary to the target. In the case where apolynucleotide primer itself serves as the capture label, at least aportion of the primer will be free to hybridize with a nucleic acid on asolid phase. DNA Labeling techniques are well known to the skilledtechnician.

The probes of the present invention are useful for a number of purposes.They can be notably used in Southern hybridization to genomic DNA. Theprobes can also be used to detect PCR amplification products. They mayalso be used to detect mismatches in the APM1 gene or mRNA using othertechniques well-known in the art.

Any of the polynucleotides, primers and probes of the present inventioncan be conveniently immobilized on a solid support. Solid supports areknown to those skilled in the art and include the walls of wells of areaction tray, test tubes, polystyrene beads, magnetic beads,nitrocellulose strips, membranes, microparticles such as latexparticles, sheep (or other animal) red blood cells, duracytes andothers. The solid support is not critical and can be selected by oneskilled in the art. Thus, latex particles, microparticles, magnetic ornon-magnetic beads, membranes, plastic tubes, walls of microtiter wells,glass or silicon chips, sheep (or other suitable animal's) red bloodcells and duracytes are all suitable examples. Suitable methods forimmobilizing nucleic acids on solid phases include ionic, hydrophobic,covalent interactions and the like. A solid support, as used herein,refers to any material which is insoluble, or can be made insoluble by asubsequent reaction. The solid support can be chosen for its intrinsicability to attract and immobilize the capture reagent. Alternatively,the solid phase can retain an additional receptor which has the abilityto attract and immobilize the capture reagent. The additional receptorcan include a charged substance that is oppositely charged with respectto the capture reagent itself or to a charged substance conjugated tothe capture reagent. As yet another alternative, the receptor moleculecan be any specific binding member which is immobilized upon (attachedto) the solid support and which has the ability to immobilize thecapture reagent through a specific binding reaction. The receptormolecule enables the indirect binding of the capture reagent to a solidsupport material before the performance of the assay or during theperformance of the assay. The solid phase thus can be a plastic,derivatized plastic, magnetic or non-magnetic metal, glass or siliconsurface of a test tube, microtiter well, sheet, bead, microparticle,chip, sheep (or other suitable animal's) red blood cells, duracytes® andother configurations known to those of ordinary skill in the art. Thepolynucleotides of the invention can be attached to or immobilized on asolid support individually or in groups of at least 2, 5, 8, 10, 12, 15,20, or 25 distinct polynucleotides of the invention to a single solidsupport. In addition, polynucleotides other than those of the inventionmay be attached to the solid support that one or more polynucleotides ofthe invention are attached to.

Consequently, the invention also deals with a method for detecting thepresence of a nucleic acid comprising a nucleotide sequence selectedfrom a group consisting of SEQ ID Nos 1-5, a fragment or a variantthereof or a complementary sequence thereto in a sample, said methodcomprising the following steps of:

-   -   a) bringing into contact a nucleic acid probe or a plurality of        nucleic acid probes that can hybridize with a nucleotide        sequence included in a nucleic acid selected from the group        consisting of the nucleotide sequences of SEQ ID Nos 1-5, a        fragment or a variant thereof or a complementary sequence        thereto and the sample to be assayed.    -   b) detecting the hybrid complex formed between the probe and a        nucleic acid in the sample.

In a first preferred embodiment of this detection method, said nucleicacid probe or the plurality of nucleic acid probes are labeled with adetectable molecule. In a second preferred embodiment of said method,said nucleic acid probe or the plurality of nucleic acid probes has beenimmobilized on a substrate. In a third preferred embodiment, the nucleicacid probe or the plurality of nucleic acid probes comprise either asequence which is selected from the group consisting of the nucleotidesequences of SEQ ID Nos B 1 to B23, C1 to C24, D1 to 26 and E1 to E26 ora biallelic marker selected from the group consisting of A1 to A26 orthe complements thereto.

The invention further concerns a kit for detecting the presence of anucleic acid comprising a nucleotide sequence selected from a groupconsisting of SEQ ID Nos 1-5, a fragment or a variant thereof or acomplementary sequence thereto in a sample, said kit comprising:

-   -   a) a nucleic acid probe or a plurality of nucleic acid probes        that can hybridize with a nucleotide sequence included in a        nucleic acid selected form the group consisting of the        nucleotide sequences of SEQ ID Nos 1-5, a fragment or a variant        thereof or a complementary sequence thereto;    -   b) optionally, the reagents necessary for performing the        hybridization reaction.

In a first preferred embodiment of the detection kit, the nucleic acidprobe or the plurality of nucleic acid probes are labeled with adetectable molecule. In a second preferred embodiment of the detectionkit, the nucleic acid probe or the plurality of nucleic acid probes hasbeen immobilized on a substrate. In a third preferred embodiment of thedetection kit, the nucleic acid probe or the plurality of nucleic acidprobes comprise either a sequence which is selected from the groupconsisting of the nucleotide sequences of SEQ ID Nos B 1 to B23, C1 toC24, D1 to D26 and E1 to E26 or a biallelic marker selected from thegroup consisting of A1 to A26 or the complements thereto.

Oligonucleotide Arrays

A substrate comprising a plurality of oligonucleotide primers or probesof the invention may be used either for detecting or amplifying targetedsequences in the APM1 gene and may also be used for detecting mutationsin the coding or in the non-coding sequences of the APM1 gene.

Any polynucleotide provided herein may be attached in overlapping areasor at random locations on the solid support. Alternatively thepolynucleotides of the invention may be attached in an ordered arraywherein each polynucleotide is attached to a distinct region of thesolid support which does not overlap with the attachment site of anyother polynucleotide. Preferably, such an ordered array ofpolynucleotides is designed to be “addressable” where the distinctlocations are recorded and can be accessed as part of an assayprocedure. Addressable polynucleotide arrays typically comprise aplurality of different oligonucleotide probes that are coupled to asurface of a substrate in different known locations. The knowledge ofthe precise location of each polynucleotide makes these “addressable”arrays particularly useful in hybridization assays. Any addressablearray technology known in the art can be employed with thepolynucleotides of the invention. One particular embodiment of thesepolynucleotide arrays, known as Genechips™, has been generally describedin U.S. Pat. No. 5,143,854 and PCT publications WO 90/15070 and92/10092. These arrays may generally be produced using mechanicalsynthesis methods or light directed synthesis methods which incorporatea combination of photolithographic methods and solid phaseoligonucleotide synthesis (Fodor et al., 1991). The immobilization ofarrays of oligonucleotides on solid supports has been rendered possibleby the development of a technology generally identified as “Very LargeScale Immobilized Polymer Synthesis” (VLSIPS™) in which, typically,probes are immobilized in a high density array on a solid surface of achip. Examples of VLSIPS™ technologies are provided in U.S. Pat. Nos.5,143,854 and 5,412,087 and in PCT Publications WO 90/15070, WO 92/10092and WO 95/11995, which describe methods for forming oligonucleotidearrays through techniques such as light-directed synthesis techniques.In designing strategies aimed at providing arrays of nucleotidesimmobilized on solid supports, further presentation strategies weredeveloped to order and display the oligonucleotide arrays on the chipsin an attempt to maximize hybridization patterns and sequenceinformation. Examples of such presentation strategies are disclosed inPCT Publications WO 94/12305, WO 94/11530, WO 97/29212 and WO 97/31256.

In another embodiment of the oligonucleotide arrays of the invention, anoligonucleotide probe matrix may advantageously be used to detectmutations occurring in the APM1 gene and preferably in its regulatoryregion. For this particular purpose, probes are specifically designed tohave a nucleotide sequence allowing their hybridization to the genesthat carry known mutations (either by deletion, insertion orsubstitution of one or several nucleotides). By known mutations is meantmutations on the APM1 gene that have been identified according, forexample, to the technique used by Huang et al. (1996) or Samson et al.(1996).

Another technique that is used to detect mutations in the APM1 gene is ahigh-density DNA array. Each oligonucleotide probe constituting a unitelement of the high density DNA array is designed to match a specificsubsequence of the APM1 genomic DNA or cDNA. Thus, an array of wild-typeApm1 oligonucleotides complementary to subsequences of the target genesequence is used to determine the identity of the target sequence,measure its amount, and detect differences between the target sequenceand the reference sequence. In one such design (4L tiled array) uses aset of four probes (A, C, G, T), preferably 15-nucleotide oligomers. Ineach set of four probes, the perfect complement will hybridize morestrongly than mismatched probes. Consequently, a nucleic acid target oflength L is scanned for mutations with a tiled array containing 4Lprobes, the whole probe set containing all the possible mutations in theknown wild-type reference sequence. The hybridization signals of the15-mer probe set tiled array are perturbed by a single base change inthe target sequence. As a consequence, there is a characteristic loss ofsignal or a “footprint” for the probes flanking a mutation position.This technique was described by Chee et al. in 1996, which is hereinincorporated by reference.

Consequently, the invention concerns an array of nucleic acid moleculescomprising at least one polynucleotide described above as probes andprimers. Preferably, the invention concerns an array of nucleic acidsequencescomprising at least two polynucleotides described above asprobes and primers.

A further object of the invention consists of an array of nucleic acidsequences comprising either at least one of the sequences selected fromthe group consisting of SEQ ID Nos B1 to B23, C1 to C24, D1 to D26 andE1 to E26 or the sequences complementary thereto or a fragment thereofof, or at least 8, 10, 12, 15, 18, 20, 25, 30, or 40 consecutivenucleotides thereof, or at least one sequence comprising a biallelicmarker selected from the group consisting of A1 to A26 or thecomplements thereto.

The invention also pertains to an array of nucleic acid sequencescomprising either at least two of the sequences selected from the groupconsisting of SEQ ID Nos B1 to B23, C1 to C24, D1 to D26 and E1 to E26or the sequences complementary thereto or a fragment thereof, or atleast 8 consecutive nucleotides thereof, or at least two sequencescomprising a biallelic marker selected from the group consisting of A1to A26 or the complements thereto.

APM1 Proteins and Polypeptide Fragments:

The term “APM1 polypeptides” is used herein to embrace all of theproteins and polypeptides of the present invention. Also forming part ofthe invention are polypeptides encoded by the polynucleotides of theinvention, as well as fusion polypeptides comprising such polypeptides.The invention embodies APM1 proteins from humans, including isolated orpurified APM1 proteins consisting of, consisting essentially of, orcomprising the sequence of SEQ ID NO: 6. It should be noted the APM1proteins of the invention are based on the naturally-occurring variantof the amino acid sequence of human APM1, wherein the aspartic acidresidue of amino acid position 56 has been replaced with a glutamic acidresidue. This variant protein and the fragments thereof which containamino acid position 56 of SEQ ID NO: 6 are collectively referred toherein as “56-Glu variants.”

The present invention embodies isolated, purified, and recombinantpolypeptides comprising a contiguous span of at least 6 amino acids,preferably at least 8 to 10 amino acids, more preferably at least 12,15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID NO: 6, wherein saidcontiguous span includes a glutamic acid residue at amino acid position56 in SEQ ID NO: 6. In other preferred embodiments the contiguousstretch of amino acids comprises the site of a mutation or functionalmutation, including a deletion, addition, swap or truncation of theamino acids in the APM1 protein sequence.

APM1 proteins are preferably isolated from human or mammalian tissuesamples or expressed from human or mammalian genes. The APM1polypeptides of the invention can be made using routine expressionmethods known in the art. The polynucleotide encoding the desiredpolypeptide, is ligated into an expression vector suitable for anyconvenient host. Both eukaryotic and prokaryotic host systems are usedin forming recombinant polypeptides, and a summary of some of the morecommon systems is provided herein. The polypeptide is then isolated fromlysed cells or from the culture medium and purified to the extent neededfor its intended use. Purification is by any technique known in the art,for example, differential extraction, salt fractionation,chromatography, centrifugation, and the like. See, for example, Methodsin Enzymology for a variety of methods for purifying proteins.

In addition, shorter protein fragments can be produced by chemicalsynthesis. Alternatively the proteins of the invention are extractedfrom cells or tissues of humans or non-human animals. Methods forpurifying proteins are known in the art, and include the use ofdetergents or chaotropic agents to disrupt particles followed bydifferential extraction and separation of the polypeptides by ionexchange chromatography, affinity chromatography, sedimentationaccording to density, and gel electrophoresis, for example.

Any APM1 cDNA, including SEQ ID NO: 5, can be used to express APM1proteins and polypeptides. The nucleic acid encoding the APM1 protein orpolypeptide to be expressed is operably linked to a promoter in anexpression vector using conventional cloning technology. The APM1 insertin the expression vector may comprise the full coding sequence for theAPM1 protein or a portion thereof. For example, the APM1 derived insertmay encode a polypeptide comprising at least 10 consecutive amino acidsof the APM1 protein of SEQ ID NO: 6, where in said consecutive aminoacids comprise a glutamic acid residue in amino acid position 56.

The expression vector is any of the mammalian, yeast, insect orbacterial expression systems known in the art. Commercially availablevectors and expression systems are available from a variety of suppliersincluding Genetics Institute (Cambridge, Mass.), Stratagene (La Jolla,Calif.), Promega (Madison, Wis.), and Invitrogen (San Diego, Calif.). Ifdesired, to enhance expression and facilitate proper protein folding,the codon context and codon pairing of the sequence can be optimized forexpression in theorganism in which the expression vector is introduced,as explained by Hatfield, et al., U.S. Pat. No. 5,082,767.

In one embodiment, the entire coding sequence of the APM1 cDNA throughthe poly A signal of the cDNA are operably linked to a promoter in theexpression vector. Alternatively, if the nucleic acid encoding a portionof the APM1 protein lacks a methionine to serve as the initiation site,an initiating methionine can be introduced next to the first codon ofthe nucleic acid using conventional techniques. Similarly, if the insertfrom the APM1 cDNA lacks a poly A signal, this sequence can be added tothe construct by, for example, splicing out the Poly A signal from pSG5(Stratagene) using BglI and SalI restriction endonuclease enzymes andincorporating it into the mammalian expression vector pXT1 (Stratagene).pXT1 contains the LTRs and a portion of the gag gene from Moloney MurineLeukemia Virus. The position of the LTRs in the construct allowefficient stable transfection. The vector includes the Herpes SimplexThymidine Kinase promoter and the selectable neomycin gene. The nucleicacid encoding the APM1 protein or a portion thereof is obtained by PCRfrom a bacterial vector containing the APM1 cDNA of SEQ ID NO: 5. Theoligonucleotide primers used are complementary to the APM1 cDNA, or aportion thereof, and contain restriction endonuclease sequences for PstI incorporated into the 5′ primer and BglII at the 5′ end of thecorresponding cDNA 3′ primer, taking care to ensure that the sequenceencoding the APM1 protein, or portion thereof, is positioned properlywith respect to the poly A signal. The purified fragment obtained fromthe resulting PCR reaction is digested with PstI, blunt ended with anexonuclease, digested with Bgl II, purified, and ligated to pXT1.

The ligated product is transfected into mouse NIH 3T3 cells usingLipofectin (Life Technologies, Inc., Grand Island, N.Y.) underconditions outlined in the product specification. Positive transfectantsare selected after growing the transfected cells in 600 μg/mL G418(Sigma, St. Louis, Mo.).

Alternatively, the nucleic acids encoding the APM1 protein or a portionthereof are cloned into pED6dpc2 (Genetics Institute, Cambridge, Mass.).The resulting pED6dpc2 constructs is\\are transfected into a suitablehost cell, such as COS 1 cells. Methotrexate resistant cells areselected and expanded.

The above procedures may also be used to express a mutant APM1 proteinresponsible for a detectable phenotype or a portion thereof.

The expressed proteins are purified using conventional purificationtechniques such as ammonium sulfate precipitation or chromatographicseparation based on size or charge. The protein encoded by the nucleicacid insert may also be purified using standard immunochromatographytechniques. In such procedures, a solution containing the expressed APM1protein or portion thereof, such as a cell extract, is applied to acolumn having antibodies against the APM1 protein or portion thereofattached to the chromatography matrix. The expressed protein is allowedto bind to the immunochromatography column. Thereafter, the column iswashed to remove non-specifically bound proteins. The specifically boundexpressed protein is then released from the column and recovered usingstandard techniques.

To confirm expression of the APM1 protein or a portion thereof, theproteins expressed in host cells containing an expression vectorcontaining an insert encoding the APM1 protein or a portion thereof canbe compared to the proteins expressed in host cells containing theexpression vector without an insert. The presence of a band in samplesfrom cells containing the expression vector with an insert which isabsent in samples from cells containing the expression vector without aninsert indicates that the APM1 protein or a portion thereof is beingexpressed. Generally, the band will have the mobility expected for theAPM1 protein or portion thereof. However, the band may have a mobilitydifferent than that expected as a result of modifications such asglycosylation, ubiquitination, or enzymatic cleavage.

Antibodies capable of specifically recognizing the expressed APM1protein or a portion thereof are described below.

If antibody production is not possible, the nucleic acids encoding theAPM1 protein or a portion thereof can be incorporated into expressionvectors designed for use in purification schemes employing chimericpolypeptides. In such strategies, the nucleic acid encoding the APM1protein or a portion thereof is inserted in frame with a gene encodingthe other half of the chimera. The other half of the chimera can beβ-globin or a nickel binding polypeptide encoding sequence, for example.A chromatography matrix having antibody to β-globin or nickel attachedthereto can then be used to purify the chimeric protein. Proteasecleavage sites are engineered between the β-globin gene or the nickelbinding polypeptide and the APM1 protein or portion thereof. Thus, thetwo polypeptides of the chimera are separated from one another byprotease digestion.

One useful expression vector for generating β-globin chimerics is pSG5(Stratagene), which encodes rabbit β-globin. Intron II of the rabbitβ-globin gene facilitates splicing of the expressed transcript, and thepolyadenylation signal incorporated into the construct increases thelevel of expression. These techniques are well known to those skilled inthe art of molecular biology. Standard methods are published in methodstexts such as Davis et al., (Basic Methods in Molecular Biology, L. G.Davis, M. D. Dibner, and J. F. Battey, ed., Elsevier Press, NY, 1986)and many of the methods are available from Stratagene, LifeTechnologies, Inc., or Promega. Polypeptides may additionally beproduced from the construct using in vitro translation systems such asthe In vitro Express™ Translation Kit (Stratagene).

Antibodies That Bind APM1 Polypeptides of the Invention

Any APM1 polypeptide or whole protein may be used to generate antibodiescapable of specifically binding to expressed APM1 protein or fragmentsthereof as described. The antibody compositions of the invention arecapable of specifically binding or specifically bind to the 56-Gluvariant of the APM1 protein. For an antibody composition to specificallybind to the 56-Glu variant of APM1 it must demonstrate at least a 5%,10%, 15%, 20%, 25%, 50%, or 100% greater binding affinity for the 56-Gluvariant of APM1 than for the 56-Asp variant of APM1 in an ELISA, RIA, orother antibody-based binding assay.

In a preferred embodiment of the invention antibody compositions arecapable of selectively binding, or selectively bind to anepitope-containing fragment of a polypeptide comprising a contiguousspan of at least 6 amino acids, preferably at least 8 to 10 amino acids,more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acidsof SEQ ID NO: 6, wherein said epitope comprises a glutamic acid residueat amino acid position 56 in SEQ ID NO: 6, wherein said antibodycomposition is optionally either polyclonal or monoclonal.

The present invention also contemplates the use of polypeptidescomprising a contiguous span of at least 6 amino acids, preferably atleast 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 50,or 100 amino acids of a APM1 polypeptide in the manufacture ofantibodies, wherein said contiguous span comprises a glutamic acidresidue at amino acid position 56 of SEQ ID NO:6. In a preferredembodiment such polypeptides are useful in the manufacture of antibodiesto detect the presence and absence of the 56-Glu variant.

Non-human animals or mammals, whether wild-type or transgenic, whichexpress a different species of APM1 than the one to which antibodybinding is desired, and animals which do not express APM1 (i.e. an APM1knock out animal as described in herein) are particularly useful forpreparing antibodies. APM1 knock out animals will recognize all or mostof the exposed regions of APM1 as foreign antigens, and thereforeproduce antibodies with a wider array of APM1 epitopes. Moreover,smaller polypeptides with only 10 to 30 amino acids may be useful inobtaining specific binding to the 56-Glu variant. In addition, thehumoral immune system of animals which produce a species of APM1 thatresembles the antigenic sequence will preferentially recognize thedifferences between the animal's native APM1 species and the antigensequence, and produce antibodies to these unique sites in the antigensequence. Such a technique will be particularly useful in obtainingantibodies that specifically bind to the 56-Glu variant.

Amplification Of The APM1 Gene.

1. DNA Extraction

As for the source of the genomic DNA to be subjected to analysis, almostany test sample can be used without any particular limitation. Thesetest samples include biological samples which can be tested by themethods of the present invention described herein and include human andanimal body fluids such as whole blood, serum, plasma, cerebrospinalfluid, urine, lymph fluids, and various external secretions of therespiratory, intestinal and genitourinary tracts, tears, saliva, milk,white blood cells, myelomas and the like; biological fluids such as cellculture supernatants; fixed tissue specimens including tumor andnon-tumor tissue and lymph node tissues; bone marrow aspirates and fixedcell specimens. The preferred source of genomic DNA used in the contextof the present invention is from peripheral venous blood of each donor.

The techniques of DNA extraction are well-known to the technician ofordinary skill in the art. Such techniques are described notably by Linet al. (1998) and by Mackey et al. (1998).

2. DNA Amplification

DNA amplification techniques are well-known to those skilled in the art.Amplification techniques that can be used in the context of the presentinvention include, but are not limited to, the ligase chain reaction(LCR) described in EP-A-320 308, WO 9320227 and EP-A-439 182, thedisclosures of which are incorporated herein by reference, thepolymerase chain reaction (PCR, RT-PCR) and techniques such as thenucleic acid sequence based amplification (NASBA) described in GuatelliJ. C., et al. (1990) and in Compton J.(1991), Q-beta amplification asdescribed in European Patent Application No 4544610, strand displacementamplification as described in Walker et al. (1996) and EP A 684 315 and,target mediated amplification as described in PCT Publication WO9322461, the disclosures of which are incorporated herein by reference.For amplification of mRNAs, it is within the scope of the presentinvention to reverse transcribe mRNA into cDNA followed by polymerasechain reaction (RT-PCR); or, to use a single enzyme for both steps asdescribed in U.S. Pat. No. 5,322,770 or, to use Asymmetric Gap LCR(RT-AGLCR) as described by Marshall et al. (1994). AGLCR is amodification of GLCR that allows the amplification of RNA.

The PCR technology is the preferred amplification technique used in thepresent invention. A variety of PCR techniques are familiar to thoseskilled in the art. For a review of PCR technology, see White (1997) andthe publication entitled “PCR Methods and Applications” (1991, ColdSpring Harbor Laboratory Press). In each of these PCR procedures, PCRprimers on either side of the nucleic acid sequences to be amplified areadded to a suitably prepared nucleic acid sample along with dNTPs and athermostable polymerase such as Taq polymerase, Pfu polymerase, or Ventpolymerase. The nucleic acid in the sample is denatured and the PCRprimers are specifically hybridized to complementary nucleic acidsequences in the sample. The hybridized primers are extended.Thereafter, another cycle of denaturation, hybridization, and extensionis initiated. The cycles are repeated multiple times to produce anamplified fragment containing the nucleic acid sequence between theprimer sites. PCR has further been described in several patentsincluding U.S. Pat. Nos. 4,683,195, 4,683,202 and 4,965,188. Each ofthese publications is incorporated herein by reference.

One of the aspects of the present invention is a method for theamplification of the human APM1 gene, particularly of the genomicsequence of SEQ ID No 1 or of the cDNA sequence of SEQ ID No 5, or afragment or a variant thereof in a test sample, preferably using the PCRtechnology. This method comprises the steps of contacting a test samplesuspected of containing the target APM1 encoding sequence or portionthereof with amplification reaction reagents comprising a pair ofamplification primers, and eventually in some instances a detectionprobe that can hybridize with an internal region of amplicon sequencesto confirm that the desired amplification reaction has taken place.

Thus, the present invention also relates to a method for theamplification of a human APM1 gene sequence, particularly of a portionof the genomic sequences of SEQ ID No 1 or of the cDNA sequence of SEQID No 5, or a variant thereof in a test sample, said method comprisingthe steps of:

-   -   a) contacting a test sample suspected of containing the targeted        APM1 gene sequence comprised in a nucleotide sequence selected        from a group consisting of SEQ ID Nos 1 and 5, or fragments or        variants thereof with amplification reaction reagents comprising        a pair of amplification primers as described above and located        on either side of the polynucleotide region to be amplified, and    -   b) optionally, detecting the amplification products.

In a first preferred embodiment of the above amplification method, theamplification product is detected by hybridization with a labeled probehaving a sequence which is complementary to the amplified region. In asecond preferred embodiment, the nucleic acid primers comprise asequence which is selected from the group consisting of B1 to B23, C1 toC24, D1 to D26 and E1 to E26. The primers are more particularlycharacterized in that they have sufficient complementarity with anysequence of a strand of the genomic sequence close to the region to beamplified, for example with a non-coding sequence adjacent to the exonsto be amplified.

The invention also concerns a kit for the amplification of a human APM1gene sequence, particularly of a portion of the genomic sequence of SEQID No 1 or of the cDNA sequence of SEQ ID No 5, or a variant thereof ina test sample, wherein said kit comprises:

-   -   a) a pair of oligonucleotide primers located on either side of        the APM1 region to be amplified;    -   b) optionally, the reagents necessary for performing the        amplification reaction.

In one embodiment of the above amplification kit, the amplificationproduct is detected by hybridization with a labeled probe having asequence which is complementary to the amplified region.

In another embodiment of the above amplification kit, primers comprise asequence which is selected from the group consisting of B1 to B23, C1 toC24, D1 to D26 and E1 to E26.

APM1-Related Biallelic Markers

Biallelic markers generally consist of a polymorphism at one single baseposition. Each biallelic marker therefore corresponds to two forms of apolynucleotide sequence which, when compared with one another, present anucleotide modification at one position. Usually, the nucleotidemodification involves the substitution of one nucleotide for another(for example C instead of T).

Advantages of the Biallelic Markers of the Present Invention

The APM1-related biallelic markers of the present invention offer anumber of important advantages over other genetic markers such as RFLP(Restriction fragment length polymorphism) and VNTR (Variable Number ofTandem Repeats) markers.

The first generation of markers, were RFLPs, which are variations thatmodify the length of a restriction fragment. But methods used toidentify and to type RFLPs are relatively wasteful of materials, effort,and time. The second generation of genetic markers were VNTRs, which canbe categorized as either minisatellites or microsatellites.Minisatellites are tandemly repeated DNA sequences present in units of5-50 repeats which are distributed along regions of the humanchromosomes ranging from 0.1 to 20 kilobases in length. Since theypresent many possible alleles, their informative content is very high.Minisatellites are scored by performing Southern blots to identify thenumber of tandem repeats present in a nucleic acid sample from theindividual being tested. However, there are only 10 potential VNTRs thatcan be typed by Southern blotting. Moreover, both RFLP and VNTR markersare costly and time-consuming to develop and assay in large numbers.

Single nucleotide polymorphism or biallelic markers can be used in thesame manner as RFLPs and VNTRs but offer several advantages. SNP aredensely spaced in the human genome and represent the most frequent typeof variation. An estimated number of more than 10⁷ sites are scatteredalong the 3×10⁹ base pairs of the human genome. Therefore, SNP occur ata greater frequency and with greater uniformity than RFLP or VNTRmarkers which means that there is a greater probability that such amarker will be found in close proximity to a genetic locus of interest.SNP are less variable than VNTR markers but are mutationally morestable.

Also, the different forms of a characterized single nucleotidepolymorphism, such as the biallelic markers of the present invention,are often easier to distinguish and can therefore be typed easily on aroutine basis. Biallelic markers have single nucleotide based allelesand they have only two common alleles, which allows highly paralleldetection and automated scoring. The biallelic markers of the presentinvention offer the possibility of rapid, high throughput genotyping ofa large number of individuals.

Biallelic markers are densely spaced in the genome, sufficientlyinformative and can be assayed in large numbers. The combined effects ofthese advantages make biallelic markers extremely valuable in geneticstudies. Biallelic markers can be used in linkage studies in families,in allele sharing methods, in linkage disequilibrium studies inpopulations, in association studies of case-control populations or oftrait positive and trait negative populations. An important aspect ofthe present invention is that biallelic markers allow associationstudies to be performed to identify genes involved in complex traits.Association studies examine the frequency of marker alleles in unrelatedcase- and control-populations and are generally employed in thedetection of polygenic or sporadic traits. Association studies may beconducted within the general population and are not limited to studiesperformed on related individuals in affected families (linkage studies).Biallelic markers in different genes can be screened in parallel fordirect association with disease or response to a treatment. Thismultiple gene approach is a powerful tool for a variety of human geneticstudies as it provides the necessary statistical power to examine thesynergistic effect of multiple genetic factors on a particularphenotype, drug response, sporadic trait, or disease state with acomplex genetic etiology.

Candidate Gene of the Present Invention

Different approaches can be employed to perform association studies:genome-wide association studies, candidate region association studiesand candidate gene association studies. Genome-wide association studiesrely on the screening of genetic markers evenly spaced and covering theentire genome. The candidate gene approach is based on the study ofgenetic markers specifically located in genes potentially involved in abiological pathway related to the trait of interest. In the presentinvention, APM1 is the candidate gene. Indeed, the APM1 gene seems to beinvolved in obesity and in others disorders linked to obesity. Thecandidate gene analysis clearly provides a short-cut approach to theidentification of genes and gene polymorphisms related to a particulartrait when some information concerning the biology of the trait isavailable. However, it should be noted that all of the biallelic markersdisclosed in the instant application can be employed as part ofgenome-wide association studies or as part of candidate regionassociation studies and such uses are specifically contemplated in thepresent invention and claims.

APM1-Related Biallelic Markers and Polynucleotides Related Thereto

The invention also concerns APM1-related biallelic markers. As usedherein the term “APM1-related biallelic marker” relates to a set ofbiallelic markers in linkage disequilibrium with the APM1 gene. The termAPM1-related biallelic marker includes the biallelic markers designatedA1 to A26.

A portion of the biallelic markers of the present invention aredisclosed in Tables A and B. Their location on the APM1 gene isindicated in Tables A and B and also as a single base polymorphism inthe features of SEQ ID No 1. The pairs of primers allowing theamplification of a nucleic acid containing the polymorphic base of oneAPM1 biallelic marker are listed in Table 1 of Example 2. TABLE A Listof biallelic markers surrounded by sequence that has never beenpreviously suggested in the art. Biallelic Localization in FrequencyMarker position in marker Marker Name APM1 gene Polymorphism Of Allele 2SEQ ID No 1 A1  9-27/261 5′regulatory Allele 1: G 3787 region Allele 2:C A2  99-14387/129 Intron 1 Allele 1: A 11118 Allele 2: C A3  9-12/48Intron 1 Allele 1: T 1.5%  15120 Allele 2: C A5  9-12/355 or Intron 2Allele 1: G 26% 15427 9-13/297 Allele 2: T A6  9-12/428 or Intron 2Allele 1: A 11% 15500 9-13/370 Allele 2: G A7  99-14405/105 Intron 2Allele 1: G 37% 15863 Allele 2: A A9  17-30-216 5′ regulatory Allele 1:G 945 region Allele 2: A A10 9-27-211 5′ regulatory Allele 1: A 3738region Allele 2: G A11 9-27-246 5′ regulatory Allele 1: G 3773 regionAllele 2: A A12 17-31-298 Intron 1 Allele 1: A 5095 Allele 2: G A1317-31-413 Intron 1 Allele 1: T 5210 Allele 2: C A14 17-32-24 Intron 1Allele 1: T 10637 Allele 2: C A15 99-14387-50 Intron 1 Allele 1: C 11039Allele 2: A A16 99-14387-199 Intron 1 Allele 1: A 11188 Allele 2: G A1717-33- Intron 1 Allele 1: no 13973 TGAGACT insert Allele 2: TGAGACTinsert A18 17-34-860 Intron 1 Allele 1: G 14702 Allele 2: A A1917-34-915 Intron 1 Allele 1: G 14757 Allele 2: A A20 17-35-71 Intron 1Allele 1: C 14815 Allele 2: T A21 17-35-306 Intron 1 Allele 1: G 15050Allele 2: T A22 17-36-47 Intron 2 Allele 1: G 15680 Allele 2: C A2317-36-120 Intron 2 Allele 1: C 15790 Allele 2: T

TABLE B List of biallelic markers surrounded by previously suggestedsequence, where one allele, allele 2, has never been previouslysuggested in the art. The identity of the nucleotide of the originalallele has been previously suggested in the art. Biallelic Localizationin Frequency Marker position in marker Marker Name APM1 genePolymorphism Of Allele 2 SEQ ID No 1 A4  9-12/124 or Exon 2 Allele 1: T11.5% 15196 9-13/66 Allele 2: G A8  9-16/189 Exon 3 Allele 1: A   40%17170 Allele 2: Del A24 17-37-629 Exon 3 Allele 1: A 17829 Allele 2: GA25 17-37-811 Exon 3 Allele 1: G 18011 Allele 2: A A26 17-38-349 Exon 3Allele 1: C 18489 Allele 2: T

The invention also relates to a purified and/or isolated nucleotidesequence comprising a polymorphic base of a biallelic marker located inthe sequence of the APM1 gene, preferably of a biallelic marker selectedfrom the group consisting of A1, A2, A3, A4, A5, A6, A7, A8, A9, A10,A11, A12, A13, A14, A15, A16, A17, A18, A19, A20, A21, A22, A23, A24,A25 and A26, and the complements thereof; optionally, wherein saidAPM1-related biallelic marker is selected from the group consisting ofA1, A2, and A7 or the group consisting of A4 and A8. The sequence hasbetween 8 and 1000 nucleotides in length, and preferably comprises atleast 8, 10, 12, 15, 18, 20, 25, 35, 40, 50, 60, 70, 80, 100, 250, 500or 1000 contiguous nucleotides of a nucleotide sequence selected fromthe group consisting of SEQ ID Nos 1 and 5 or a variant thereof or acomplementary sequence thereto. These nucleotide sequences comprise thepolymorphic base of either allele 1 or allele 2 of the consideredbiallelic marker. Optionally, said biallelic marker may be within 6, 5,4, 3, 2, or 1 nucleotides of the center of said polynucleotide or at thecenter of said polynucleotide. Optionally, the 3′ end of said contiguousspan may be present at the 3′ end of said polynucleotide. Optionally,biallelic marker may be present at the 3′ end of said polynucleotide.Optionally, the 3′ end of said polynucleotide may be located within orat least 2, 4, 6, 8, 10, 12, 15, 18, 20, 25, 50, 100, 250, 500, or 1000nucleotides upstream of a biallelic marker of the APM1 gene in saidsequence. Optionally, the 3′ end of said polynucleotide may be located 1nucleotide upstream of a biallelic marker of the APM1 gene in saidsequence. Optionally, said polynucleotide may further comprise a label.Optionally, said polynucleotide can be attached to solid support. In afurther embodiment, the polynucleotides defined above can be used aloneor in any combination.

In a preferred embodiment, the sequences comprising a polymorphic baseof one of the biallelic markers listed in Tables A and B are selectedfrom the group consisting of the nucleotide sequences that have acontiguous span of, that consist of, that are comprised in, or thatcomprise a polynucleotide selected from the group consisting of thenucleic acid sequences set forth as Nos 9-27, 99-14387, 9-12, 9-13,99-14405, and 9-16 (listed in Table 1) or a variant thereof or acomplementary sequence thereto.

The invention further concerns a nucleic acid encoding the APM1 protein,wherein said nucleic acid comprises a polymorphic base of a biallelicmarker selected from the group consisting of A1, A2, A3, A4, A5, A6, A7,A8, A9, A10, A11, A12, A13, A14, A15, A16, A17, A18, A19, A20, A21, A22,A23, A24, A25 and A26 and the complements thereof; optionally, whereinsaid APM1-related biallelic marker is selected from the group consistingof A1, A2, and A7 or the group consisting of A4 and A8.

The invention also encompasses the use of any polynucleotide for, or anypolynucleotide for use in, determining the identity of one or morenucleotides at a APM1-related biallelic marker. In addition, thepolynucleotides of the invention for use in determining the identity ofone or more nucleotides at a APM1-related biallelic marker encompasspolynucleotides with any further limitation described in thisdisclosure, or those following, specified alone or in any combination.Optionally, said APM1-related biallelic marker may be selected from thegroup consisting of A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12,A13, A14, A15, A16, A17, A18, A19, A20, A21, A22, A23, A24, A25 and A26,and the complements thereof; optionally, wherein said APM1-relatedbiallelic marker is selected from the group consisting of A1, A2, and A7or the group consisting of A4 and A8. Optionally, said polynucleotidemay comprise a sequence disclosed in the present specification.Optionally, said polynucleotide may consist of, or consist essentiallyof any polynucleotide described in the present specification.Optionally, said determining may be performed in a hybridization assay,sequencing assay, microsequencing assay, or allele-specificamplification assay. Optionally, said polynucleotide may be attached toa solid support, array, or addressable array. Optionally, saidpolynucleotide may be labeled. A preferred polynucleotide may be used ina hybridization assay for determining the identity of the nucleotide ata biallelic marker of the APM1 gene. Another preferred polynucleotidemay be used in a sequencing or microsequencing assay for determining theidentity of the nucleotide at a biallelic marker of the APM1 gene. Athird preferred polynucleotide may be used in an allele-specificamplification assay for determining the identity of the nucleotide at abiallelic marker of the APM1 gene. A fourth preferred polynucleotide maybe used in amplifying a segment of polynucleotides comprising abiallelic marker of the APM1 gene. Optionally, any of thepolynucleotides described above may be attached to a solid support,array, or addressable array. Optionally, said polynucleotide may belabeled.

Additionally, the invention encompasses the use of any polynucleotidefor, or any polynucleotide for use in, amplifying a segment ofnucleotides comprising a APM1-related biallelic marker. In addition, thepolynucleotides of the invention for use in amplifying a segment ofnucleotides comprising a APM1-related biallelic marker encompasspolynucleotides with any further limitation described in thisdisclosure, or those following, specified alone or in any combination.Optionally, said APM1-related biallelic marker may be selected from thegroup consisting of A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12,A13, A14, A15, A16, A17, A18, A19, A20, A21, A22, A23, A24, A25 and A26,and the complements thereof; optionally, wherein said APM1-relatedbiallelic marker is selected from the group consisting of A1, A2, and A7or the group consisting of A4 and A8. Optionally, said polynucleotidemay comprise a sequence disclosed in the present specification.Optionally, said polynucleotide may consist of, or consist essentiallyof any polynucleotide described in the present specification.Optionally, said amplifying may be performed by a PCR or LCR.Optionally, said polynucleotide may be attached to a solid support,array, or addressable array. Optionally, said polynucleotide may belabeled.

The primers for amplification or sequencing reaction of a polynucleotidecomprising a biallelic marker of the invention may be designed from thedisclosed sequences for any method known in the art. A preferred set ofprimers are fashioned such that the 3′ end of the contiguous span ofidentity with a sequence selected from the group consisting of SEQ IDNos 1 or 5 or a sequence complementary thereto or a variant thereof ispresent at the 3′ end of the primer. Such a configuration allows the 3′end of the primer to hybridize to a selected nucleic acid sequence anddramatically increases the efficiency of the primer for amplification orsequencing reactions. Allele specific primers may be designed such thata polymorphic base of a biallelic marker is at the 3′ end of thecontiguous span and the contiguous span is present at the 3′ end of theprimer. Such allele specific primers tend to selectively prime anamplification or sequencing reaction so long as they are used with anucleic acid sample that contains one of the two alleles present at abiallelic marker. The 3′ end of the primer of the invention may belocated within or at least 2, 4, 6, 8, 10, 12, 15, 18, 20, 25, 50, 100,250, 500, or 1000 nucleotides upstream of a biallelic marker of APM1 insaid sequence or at any other location which is appropriate for theirintended use in sequencing, amplification or the location of novelsequences or markers. Thus, another set of preferred amplificationprimers comprise an isolated polynucleotide consisting essentially of acontiguous span of 8 to 50 nucleotides in a sequence selected from thegroup consisting of SEQ ID Nos 1 and 5 or a sequence complementarythereto or a variant thereof, wherein the 3′ end of said contiguous spanis located at the 3′end of said polynucleotide, and wherein the 3′end ofsaid polynucleotide is located upstream of a biallelic marker of theAPM1 gene in said sequence. Preferably, those amplification primerscomprise a sequence selected from the group consisting of the sequencesB1 to B23 and C1 to C24. Primers with their 3′ ends located 1 nucleotideupstream of a biallelic marker of APM1 have a special utility asmicrosequencing assays. Preferred microsequencing primers are describedin Table 3. Optionally, the biallelic marker of the APM1 gene isselected from the group consisting of A1, A2, A3, A5, A6, A7, A9, A10,A11, A12, A13, A14, A15, A16, A17, A18, A19, A20, A21, A22, and A23 andthe complements thereof. Optionally, the biallelic marker of the APM1gene is selected from the group consisting of A4, A8, A24, A25 and A26and the complements thereof; optionally, wherein said APM1-relatedbiallelic marker is selected from the group consisting of A1, A2, and A7or the group consisting of A4 and A8. Optionally, microsequencingprimers are selected from the group consisting of the nucleotidesequences D1 to D26 and E1 to E26. Alternatively preferredmicrosequencing primers are selected from the group consisting of thenucleotide sequences D3, E4, E5, E6, D7 and D8.

The probes of the present invention may be designed from the disclosedsequences for any method known in the art, particularly methods whichallow for testing if a marker disclosed herein is present. A preferredset of probes may be designed for use in the hybridization assays of theinvention in any manner known in the art such that they selectively bindto one allele of a biallelic marker, but not the other under anyparticular set of assay conditions. Preferred hybridization probescomprise the polymorphic base of either allele 1 or allele 2 of theconsidered biallelic marker. Optionally, said biallelic marker may bewithin 6, 5, 4, 3, 2, or 1 nucleotides of the center of thehybridization probe or at the center of said probe. Exemplary probes areprovided in Table 4 in the Examples.

The polynucleotides of the present invention are not limited to havingthe exact flanking sequences surrounding the polymorphic bases which areenumerated in the Sequence Listing. The flanking sequences surroundingthe biallelic markers may be lengthened or shortened to any extentcompatible with their intended use and the present inventionspecifically contemplates such sequences. The flanking regions outsideof the contiguous span need not be homologous to native flankingsequences that are known to occur in human subjects. The addition of anynucleotide sequence that is compatible with the nucleotides intended useis specifically contemplated.

Primers and probes may be labeled or immobilized on a solid support asdescribed in ” Oligonucleotide probes and primers”.

The polynucleotides of the invention which are attached to a solidsupport encompass polynucleotides with any further limitation describedin this disclosure, or those following, specified alone or in anycombination. Optionally, said polynucleotides may be specified asattached individually or in groups of at least 2, 5, 8, 10, 12, 15, 20,or 25 distinct polynucleotides of the invention to a single solidsupport. Optionally, polynucleotides other than those of the inventionmay attached to the same solid support as polynucleotides of theinvention. Optionally, when multiple polynucleotides are attached to asolid support they may be attached at random locations, or in an orderedarray. Optionally, said ordered array may be addressable.

The invention also pertains to a method of genotyping comprisingdetermining the identity of a nucleotide at a biallelic marker of theAPM1 gene selected from the group consisting of A1, A2, A3, A5, A6, A7,A9, A10, A11, A12, A13, A14, A15, A16, A17, A18, A19, A20, A21, A22, andA23 and the complements thereof in a biological sample; optionally,wherein said APM1-related biallelic marker is selected from the groupconsisting of A1, A2, and A7 or the group consisting of A4 and A8.

The invention further deals with a method of genotyping comprisingdetermining the identity of a nucleotide at an APM1-related biallelicmarker, preferably a biallelic marker of the APM1 gene selected from thegroup consisting of A4, A8, A24, A25 and A26 and the complements thereofin a biological sample.

Optionally, the biological sample is derived from a single subject.Optionally, the identity of the nucleotides at said biallelic marker isdetermined for both copies of said biallelic marker present in saidindividual's genome. Optionally, the biological sample is derived frommultiple subjects. Optionally, the method of genotyping described abovefurther comprises amplifying a portion of said sequence comprising thebiallelic marker prior to said determining step, for example by a PCRamplification.

The determining step of the above genotyping method may be performedusing a hybridization assay, a sequencing assay, an allele-specificamplification assay or a microsequencing assay. Thus, the invention alsoencompasses methods of genotyping a biological sample comprisingdetermining the identity of a nucleotide at a APM1-related biallelicmarker. In addition, the genotyping methods of the invention encompassmethods with any further limitation described in this disclosure, orthose following, specified alone or in any combination. Optionally, saidAPM1-related biallelic marker may be selected from the group consistingof A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13, A14, A15,A16, A17, A18, A19, A20, A21, A22, A23, A24, A25 and A26, and thecomplements thereof; optionally, wherein said APM1-related biallelicmarker is selected from the group consisting of A1, A2, and A7 or thegroup consisting of A4 and A8. Optionally, said biological sample isderived from a single individual or subject. Optionally, said method isperformed in vitro. Optionally, said biallelic marker is determined forboth copies of said biallelic marker present in said individual'sgenome. Optionally, said biological sample is derived from multiplesubjects or individuals. Optionally, said method further comprisesamplifying a portion of said sequence comprising the biallelic markerprior to said determining step. Optionally, wherein said amplifying isperformed by PCR, LCR, or replication of a recombinant vector comprisingan origin of replication and said portion in a host cell. Optionally,wherein said determining is performed by a hybridization assay,sequencing assay, microsequencing assay, or allele-specificamplification assay.

The present invention also encompasses diagnostic kits comprising one ormore polynucleotides of the invention with a portion or all of thenecessary reagents and instructions for genotyping a test subject bydetermining the identity of a nucleotide at a biallelic marker of APM1.The polynucleotides of a kit may optionally be attached to a solidsupport, or be part of an array or addressable array of polynucleotides.The kit may provide for the determination of the identity of thenucleotide at a marker position by any method known in the artincluding, but not limited to, a sequencing assay method, amicrosequencing assay method, a hybridization assay method, or an allelespecific amplification method. Optionally such a kit may includeinstructions for scoring the results of the determination with respectto the test subjects' risk of suffering of obesity or disorders linkedto obesity.

Methods for De Novo Identification of Biallelic Markers

Any of a variety of methods can be used to screen a genomic fragment forsingle nucleotide polymorphisms such as differential hybridization witholigonucleotide probes, detection of changes in the mobility measured bygel electrophoresis or direct sequencing of the amplified nucleic acid.A preferred method for identifying biallelic markers involvescomparative sequencing of genomic DNA fragments from an appropriatenumber of unrelated individuals.

In a first embodiment, DNA samples from unrelated individuals are pooledtogether, following which the genomic DNA of interest is amplified andsequenced. The nucleotide sequences thus obtained are then analyzed toidentify significant polymorphisms. One of the major advantages of thismethod resides in the fact that the pooling of the DNA samplessubstantially reduces the number of DNA amplification reactions andsequencing reactions, which must be carried out. Moreover, this methodis sufficiently sensitive so that a biallelic marker obtained therebyusually demonstrates a sufficient frequency of its less common allele tobe useful in conducting association studies.

In a second embodiment, the DNA samples are not pooled and are thereforeamplified and sequenced individually. This method is usually preferredwhen biallelic markers need to be identified in order to performassociation studies within candidate genes. Preferably, highly relevantgene regions such as promoter regions or exon regions may be screenedfor biallelic markers. A biallelic marker obtained using this method mayshow a lower degree of informativeness for conducting associationstudies, e.g. if the frequency of its less frequent allele may be lessthan about 10%. Such a biallelic marker will, however, be sufficientlyinformative to conduct association studies and it will further beappreciated that including less informative biallelic markers in thegenetic analysis studies of the present invention, may allow in somecases the direct identification of causal mutations, which may,depending on their penetrance, be rare mutations.

The following is a description of the various parameters of a preferredmethod used by the inventors for the identification of the biallelicmarkers of the present invention.

Genomic DNA Samples

The genomic DNA samples from which the biallelic markers of the presentinvention are generated are preferably obtained from unrelatedindividuals corresponding to a heterogeneous population of known ethnicbackground. The number of individuals from whom DNA samples are obtainedcan vary substantially, preferably from about 10 to about 1000,preferably from about 50 to about 200 individuals. It is usuallypreferred to collect DNA samples from at least about 100 individuals inorder to have sufficient polymorphic diversity in a given population toidentify as many markers as possible and to generate statisticallysignificant results.

As for the source of the genomic DNA to be subjected to analysis, anytest sample can be foreseen without any particular limitation. Thesetest samples include biological samples, which can be tested by themethods of the present invention described herein, and include human andanimal body fluids such as whole blood, serum, plasma, cerebrospinalfluid, urine, lymph fluids, and various external secretions of therespiratory, intestinal and genitourinary tracts, tears, saliva, milk,white blood cells, myelomas and the like; biological fluids such as cellculture supernatants; fixed tissue specimens including tumor andnon-tumor tissue and lymph node tissues; bone marrow aspirates and fixedcell specimens. The preferred source of genomic DNA used in the presentinvention is from peripheral venous blood of each donor. Techniques toprepare genomic DNA from biological samples are well known to theskilled technician. Details of a preferred embodiment are provided inExample 1. The person skilled in the art can choose to amplify pooled orunpooled DNA samples.

DNA Amplification

The identification of biallelic markers in a sample of genomic DNA maybe facilitated through the use of DNA amplification methods. DNA samplescan be pooled or unpooled for the amplification step. DNA amplificationtechniques are well known to those skilled in the art. Various methodsto amplify DNA fragments carrying biallelic markers are furtherdescribed hereinbefore in “Amplification of the APM1 gene”. The PCRtechnology is the preferred amplification technique used to identify newbiallelic markers. A typical example of a PCR reaction suitable for thepurposes of the present invention is provided in Example 2.

In a first embodiment of the present invention, biallelic markers areidentified using genomic sequence information generated by theinventors. Sequenced genomic DNA fragments are used to design primersfor the amplification of 500 bp fragments. These 500 bp fragments areamplified from genomic DNA and are scanned for biallelic markers.Primers may be designed using the OSP software (Hillier L. and Green P.,1991). All primers may contain, upstream of the specific target bases, acommon oligonucleotide tail that serves as a sequencing primer. Thoseskilled in the art are familiar with primer extensions, which can beused for these purposes.

Preferred primers, useful for the amplification of genomic sequencesencoding the candidate genes, focus on promoters, exons and splice sitesof the genes. A biallelic marker presents a higher probability to be aneventual causal mutation if it is located in these functional regions ofthe gene. Preferred amplification primers of the invention include thenucleotide sequences Nos B1 to B23 and the nucleotide sequences Nos C1to C24 disclosed in Example 2.

Sequencing of Amplified Genomic DNA and Identification of SingleNucleotide Polymorphisms

The amplification products generated as described above, are thensequenced using any method known and available to the skilledtechnician. Methods for sequencing DNA using either the dideoxy-mediatedmethod (Sanger method) or the Maxam-Gilbert method are widely known tothose of ordinary skill in the art. Such methods are for exampledisclosed in Sambrook et al. (1989). Alternative approaches includehybridization to high-density DNA probe arrays as described in Chee etal. (1996).

Preferably, the amplified DNA is subjected to automated dideoxyterminator sequencing reactions using a dye-primer cycle sequencingprotocol. The products of the sequencing reactions are run on sequencinggels and the sequences are determined using gel image analysis. Thepolymorphism search is based on the presence of superimposed peaks inthe electrophoresis pattern resulting from different bases occurring atthe same position. Because each dideoxy terminator is labeled with adifferent fluorescent molecule, the two peaks corresponding to abiallelic site present distinct colors corresponding to two differentnucleotides at the same position on the sequence. However, the presenceof two peaks can be an artifact due to background noise. To exclude suchan artifact, the two DNA strands are sequenced and a comparison betweenthe peaks is carried out. In order to be registered as a polymorphicsequence, the polymorphism has to be detected on both strands.

The above procedure permits those amplification products, which containbiallelic markers to be identified. The detection limit for thefrequency of biallelic polymorphisms detected by sequencing pools of 100individuals is approximately 0.1 for the minor allele, as verified bysequencing pools of known allelic frequencies. However, more than 90% ofthe biallelic polymorphisms detected by the pooling method have afrequency for the minor allele higher than 0.25. Therefore, thebiallelic markers selected by this method have a frequency of at least0.1 for the minor allele and less than 0.9 for the major allele.Preferably at least 0.2 for the minor allele and less than 0.8 for themajor allele, more preferably at least 0.3 for the minor allele and lessthan 0.7 for the major allele, thus a heterozygosity rate higher than0.18, preferably higher than 0.32, more preferably higher than 0.42.

In another embodiment, biallelic markers are detected by sequencingindividual DNA samples, the frequency of the minor allele of such abiallelic marker may be less than 0.1.

Validation of the Biallelic Markers of the Present Invention

The polymorphisms are evaluated for their usefulness as genetic markersby validating that both alleles are present in a population. Validationof the biallelic markers is accomplished by genotyping a group ofindividuals by a method of the invention and demonstrating that bothalleles are present. Microsequencing is a preferred method of genotypingalleles. The validation by genotyping step may be performed onindividual samples derived from each individual in the group or bygenotyping a pooled sample derived from more than one individual. Thegroup can be as small as one individual if that individual isheterozygous for the allele in question. Preferably the group containsat least three individuals, more preferably the group contains five orsix individuals, so that a single validation test will be more likely toresult in the validation of more of the biallelic markers that are beingtested. It should be noted, however, that when the validation test isperformed on a small group it may result in a false negative result ifas a result of sampling error none of the individuals tested carries oneof the two alleles. Thus, the validation process is less useful indemonstrating that a particular initial result is an artifact, than itis at demonstrating that there is a bonafide biallelic marker at aparticular position in a sequence. All of the genotyping, haplotyping,association, and interaction study methods of the invention mayoptionally be performed solely with validated biallelic markers.

Evaluation of the Frequency of the Biallelic Markers of the PresentInvention

The validated biallelic markers are further evaluated for theirusefulness as genetic markers by determining the frequency of the leastcommon allele at the biallelic marker site. The higher the frequency ofthe less common allele the greater the usefulness of the biallelicmarker is association and interaction studies. The determination of theleast common allele is accomplished by genotyping a group of individualsby a method of the invention and demonstrating that both alleles arepresent. This determination of frequency by genotyping step may beperformed on individual samples derived from each individual in thegroup or by genotyping a pooled sample derived from more than oneindividual. The group must be large enough to be representative of thepopulation as a whole. Preferably the group contains at least 20individuals, more preferably the group contains at least 50 individuals,most preferably the group contains at least 100 individuals. Of coursethe larger the group the greater the accuracy of the frequencydetermination because of reduced sampling error. For an indication ofthe frequency for the less common allele of a particular biallelicmarker of the invention see Table A and B. A biallelic marker whereinthe frequency of the less common allele is 30% or more is termed a “highquality biallelic marker.” All of the genotyping, haplotyping,association, and interaction study methods of the invention mayoptionally be performed solely with high quality biallelic markers.

The invention also relates to methods of estimating the frequency of anallele in a population comprising determining the proportionalrepresentation of a nucleotide at a APM1-related biallelic marker insaid population. In addition, the methods of estimating the frequency ofan allele in a population of the invention encompass methods with anyfurther limitation described in this disclosure, or those following,specified alone or in any combination. Optionally, said APM1-relatedbiallelic marker may be selected from the group consisting of A1, A2,A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13, A14, A15, A16, A17, A18,A19, A20, A21, A22, A23, A24, A25 and A26, and the complements thereof;optionally, wherein said APM1-related biallelic marker is selected fromthe group consisting of A1, A2, and A7 or the group consisting of A4 andA8. Optionally, determining the proportional representation of anucleotide at a APM1-related biallelic marker may be accomplished bydetermining the identity of the nucleotides for both copies of saidbiallelic marker present in the genome of each individual in saidpopulation and calculating the proportional representation of saidnucleotide at said APM1-related biallelic marker for the population.Optionally, determining the proportional representation may beaccomplished by performing a genotyping method of the invention on apooled biological sample derived from a representative number ofindividuals, or each individual, in said population, and calculating theproportional amount of said nucleotide compared with the total.

Methods for Genotyping an Individual for Biallelic Markers

Methods are provided to genotype a biological sample for one or morebiallelic markers of the present invention, all of which may beperformed in vitro. Such methods of genotyping comprise determining theidentity of a nucleotide at an APM1 biallelic marker site by any methodknown in the art. These methods find use in genotyping case-controlpopulations in association studies as well as individuals in the contextof detection of alleles of biallelic markers which are known to beassociated with a given trait, in which case both copies of thebiallelic marker present in individual's genome are determined so thatan individual may be classified as homozygous or heterozygous for aparticular allele.

These genotyping methods can be performed on nucleic acid samplesderived from a single individual or pooled DNA samples.

Genotyping can be performed using similar methods as those describedabove for the identification of the biallelic markers, or using othergenotyping methods such as those further described below. In preferredembodiments, the comparison of sequences of amplified genomic fragmentsfrom different individuals is used to identify new biallelic markerswhereas microsequencing is used for genotyping known biallelic markersin diagnostic and association study applications.

In one embodiment the invention encompasses methods of genotypingcomprising determining the identity of a nucleotide at an APM1-relatedbiallelic marker of SEQ ID No:1 or the complement thereof in abiological sample. Optionally, the biological sample is derived from asingle subject. Optionally, the identity of the nucleotides at saidbiallelic marker is determined for both copies of said biallelic markerpresent in said individual's genome. Optionally, the biological sampleis derived from multiple subjects. Optionally, the method furthercomprises amplifying a portion of said sequence comprising the biallelicmarker prior to said determining step. Optionally, the amplifying stepis performed by PCR. Optionally, the determining step is performed by ahybridization assay, a sequencing assay, a microsequencing assay, or anallele-specific amplification assay.

Source of DNA for Genotyping

Any source of nucleic acids, in purified or non-purified form, can beutilized as the starting nucleic acid, provided it contains or issuspected of containing the specific nucleic acid sequence desired. DNAor RNA may be extracted from cells, tissues, body fluids and the like asdescribed above. While nucleic acids for use in the genotyping methodsof the invention can be derived from any mammalian source, the testsubjects and individuals from which nucleic acid samples are taken aregenerally understood to be human.

Amplification of DNA Fragments Comprising Biallelic Markers

Methods and polynucleotides are provided to amplify a segment ofnucleotides comprising one or more biallelic marker of the presentinvention. It will be appreciated that amplification of DNA fragmentscomprising biallelic markers may be used in various methods and forvarious purposes and is not restricted to genotyping. Nevertheless, manygenotyping methods, although not all, require the previous amplificationof the DNA region carrying the biallelic marker of interest. Suchmethods specifically increase the concentration or total number ofsequences that span the biallelic marker or include that site andsequences located either distal or proximal to it. Diagnostic assays mayalso rely on amplification of DNA segments carrying a biallelic markerof the present invention. Amplification of DNA may be achieved by anymethod known in the art. Amplification techniques are described above inthe section entitled, Amplification of the APM1 Gene.

Some of these amplification methods are particularly suited for thedetection of single nucleotide polymorphisms and allow the simultaneousamplification of a target sequence and the identification of thepolymorphic nucleotide as it is further described below.

The identification of biallelic markers as described above allows thedesign of appropriate oligonucleotides, which can be used as primers toamplify DNA fragments comprising the biallelic markers of the presentinvention. Amplification can be performed using the primers initiallyused to discover new biallelic markers which are described herein or anyset of primers allowing the amplification of a DNA fragment comprising abiallelic marker of the present invention.

In some embodiments the present invention provides primers foramplifying a DNA fragment containing one or more biallelic markers ofthe present invention. Preferred amplification primers are listed inExample 2. It will be appreciated that the primers listed are merelyexemplary and that any other set of primers which produce amplificationproducts containing one or more biallelic markers of the presentinvention.

The spacing of the primers determines the length of the segment to beamplified. In the context of the present invention, amplified segmentscarrying biallelic markers can range in size from at least about 25 bpto 35 kbp. Amplification fragments from 25-3000 bp are typical,fragments from 50-1000 bp are preferred and fragments from 100-600 bpare highly preferred. It will be appreciated that amplification primersfor the biallelic markers may be any sequence which allow the specificamplification of any DNA fragment carrying the markers. Amplificationprimers may be labeled or immobilized on a solid support as described in“Oligonucleotide probes and primers”.

Methods of Genotyping DNA Samples for Biallelic Markers

Any method known in the art can be used to identify the nucleotidepresent at a biallelic marker site. Since the biallelic marker allele tobe detected has been identified and specified in the present invention,detection will prove simple for one of ordinary skill in the art byemploying any of a number of techniques. Many genotyping methods requirethe previous amplification of the DNA region carrying the biallelicmarker of interest. While the amplification of target or signal is oftenpreferred at present, ultrasensitive detection methods which do notrequire amplification are also encompassed by the present genotypingmethods. Methods well-known to those skilled in the art that can be usedto detect biallelic polymorphisms include methods such as, conventionaldot blot analyzes, single strand conformational polymorphism analysis(SSCP) described by Orita et al. (1989), denaturing gradient gelelectrophoresis (DGGE), heteroduplex analysis, mismatch cleavagedetection, and other conventional techniques as described in Sheffieldet al. (1991), White et al. (1992), Grompe et al. (1989 and 1993).Another method for determining the identity of the nucleotide present ata particular polymorphic site employs a specializedexonuclease-resistant nucleotide derivative as described in U.S. Pat.No. 4,656,127.

Preferred methods involve directly determining the identity of thenucleotide present at a biallelic marker site by sequencing assay,allele-specific amplification assay, or hybridization assay. Thefollowing is a description of some preferred methods. A highly preferredmethod is the microsequencing technique. The term “sequencing” is usedherein to refer to polymerase extension of duplex primer/templatecomplexes and includes both traditional sequencing and microsequencing.

1) Sequencing Assays

The nucleotide present at a polymorphic site can be determined bysequencing methods. In a preferred embodiment, DNA samples are subjectedto PCR amplification before sequencing as described above. DNAsequencing methods are described in “Sequencing Of Amplified Genomic DNAAnd Identification Of Single Nucleotide Polymorphisms”.

Preferably, the amplified DNA is subjected to automated dideoxyterminator sequencing reactions using a dye-primer cycle sequencingprotocol. Sequence analysis allows the identification of the basepresent at the biallelic marker site.

2) Microsequencing Assays

In microsequencing methods, the nucleotide at a polymorphic site in atarget DNA is detected by a single nucleotide primer extension reaction.This method involves appropriate microsequencing primers which,hybridize just upstream of the polymorphic base of interest in thetarget nucleic acid. A polymerase is used to specifically extend the 3′end of the primer with one single ddNTP (chain terminator) complementaryto the nucleotide at the polymorphic site. Next the identity of theincorporated nucleotide is determined in any suitable way.

Typically, microsequencing reactions are carried out using fluorescentddNTPs and the extended microsequencing primers are analyzed byelectrophoresis on ABI 377 sequencing machines to determine the identityof the incorporated nucleotide as described in EP 412 883. Alternativelycapillary electrophoresis can be used in order to process a highernumber of assays simultaneously. An example of a typical microsequencingprocedure that can be used in the context of the present invention isprovided in Example 4.

Different approaches can be used for the labeling and detection ofddNTPs. A homogeneous phase detection method based on fluorescenceresonance energy transfer has been described by Chen and Kwok (1997) andChen et al. (1997). In this method, amplified genomic DNA fragmentscontaining polymorphic sites are incubated with a 5′-fluorescein-labeledprimer in the presence of allelic dye-labeled dideoxyribonucleosidetriphosphates and a modified Taq polymerase. The dye-labeled primer isextended one base by the dye-terminator specific for the allele presenton the template. At the end of the genotyping reaction, the fluorescenceintensities of the two dyes in the reaction mixture are analyzeddirectly without separation or purification. All these steps can beperformed in the same tube and the fluorescence changes can be monitoredin real time. Alternatively, the extended primer may be analyzed byMALDI-TOF Mass Spectrometry. The base at the polymorphic site isidentified by the mass added onto the microsequencing primer (see Haffand Smirnov, 1997).

Microsequencing may be achieved by the established microsequencingmethod or by developments or derivatives thereof. Alternative methodsinclude several solid-phase microsequencing techniques. The basicmicrosequencing protocol is the same as described previously, exceptthat the method is conducted as a heterogeneous phase assay, in whichthe primer or the target molecule is immobilized or captured onto asolid support. To simplify the primer separation and the terminalnucleotide addition analysis, oligonucleotides are attached to solidsupports or are modified in such ways that permit affinity separation aswell as polymerase extension. The 5′ ends and internal nucleotides ofsynthetic oligonucleotides can be modified in a number of different waysto permit different affinity separation approaches, e.g., biotinylation.If a single affinity group is used on the oligonucleotides, theoligonucleotides can be separated from the incorporated terminatorregent. This eliminates the need of physical or size separation. Morethan one oligonucleotide can be separated from the terminator reagentand analyzed simultaneously if more than one affinity group is used.This permits the analysis of several nucleic acid species or morenucleic acid sequence information per extension reaction. The affinitygroup need not be on the priming oligonucleotide but could alternativelybe present on the template. For example, immobilization can be carriedout via an interaction between biotinylated DNA and streptavidin-coatedmicrotitration wells or avidin-coated polystyrene particles. In the samemanner, oligonucleotides or templates may be attached to a solid supportin a high-density format. In such solid phase microsequencing reactions,incorporated ddNTPs can be radiolabeled (Syvänen, 1994) or linked tofluorescein (Livak and Hainer, 1994). The detection of radiolabeledddNTPs can be achieved through scintillation-based techniques. Thedetection of fluorescein-linked ddNTPs can be based on the binding ofantifluorescein antibody conjugated with alkaline phosphatase, followedby incubation with a chromogenic substrate (such as p-nitrophenylphosphate). Other possible reporter-detection pairs include: ddNTPlinked to dinitrophenyl (DNP) and anti-DNP alkaline phosphataseconjugate (Harju et al., 1993) or biotinylated ddNTP and horseradishperoxidase-conjugated streptavidin with o-phenylenediamine as asubstrate (WO 92/15712). As yet another alternative solid-phasemicrosequencing procedure, Nyren et al. (1993) described a methodrelying on the detection of DNA polymerase activity by an enzymaticluminometric inorganic pyrophosphate detection assay (ELIDA).

Pastinen et al. (1997) describe a method for multiplex detection ofsingle nucleotide polymorphism in which the solid phase minisequencingprinciple is applied to an oligonucleotide array format. High-densityarrays of DNA probes attached to a solid support (DNA chips) are furtherdescribed below.

In one aspect the present invention provides polynucleotides and methodsto genotype one or more biallelic markers of the present invention byperforming a microsequencing assay. Preferred microsequencing primersinclude the nucleotide sequences Nos D1 to D26 and E1 to E26. Morepreferred microsequencing primers are selected from the group consistingof the nucleotide sequences Nos D3, E4, E5, E6, D7, and D8. It will beappreciated that the microsequencing primers listed in Example 4 aremerely exemplary and that, any primer having a 3′ end immediatelyadjacent to the polymorphic nucleotide may be used. Similarly, it willbe appreciated that microsequencing analysis may be performed for anybiallelic marker or any combination of biallelic markers of the presentinvention. One aspect of the present invention is a solid support whichincludes one or more microsequencing primers listed in Example 4, orfragments comprising at least 8, 12, 15, 20, 25, 30, 40, or 50consecutive nucleotides thereof and having a 3′ terminus immediatelyupstream of the corresponding biallelic marker, for determining theidentity of a nucleotide at a biallelic marker site.

3) Allele-Specific Amplification Assay Methods

In one aspect the present invention provides polynucleotides and methodsto determine the allele of one or more biallelic markers of the presentinvention in a biological sample, by allele-specific amplificationassays. Methods, primers and various parameters to amplify DNA fragmentscomprising biallelic markers of the present invention are furtherdescribed above in “Amplification Of DNA Fragments Comprising BiallelicMarkers”.

Allele Specific Amplification Primers

Discrimination between the two alleles of a biallelic marker can also beachieved by allele specific amplification, a selective strategy, wherebyone of the alleles is amplified without amplification of the otherallele. This is accomplished by placing the polymorphic base at the 3′end of one of the amplification primers. Because the extension formsfrom the 3′end of the primer, a mismatch at or near this position has aninhibitory effect on amplification. Therefore, under appropriateamplification conditions, these primers only direct amplification ontheir complementary allele. Determining the precise location of themismatch and the corresponding assay conditions are well with theordinary skill in the art.

Ligation/Amplification Based Methods

The “Oligonucleotide Ligation Assay” (OLA) uses two oligonucleotideswhich are designed to be capable of hybridizing to abutting sequences ofa single strand of a target molecules. One of the oligonucleotides isbiotinylated, and the other is detectably labeled. If the precisecomplementary sequence is found in a target molecule, theoligonucleotides will hybridize such that their termini abut, and createa ligation substrate that can be captured and detected. OLA is capableof detecting single nucleotide polymorphisms and may be advantageouslycombined with PCR as described by Nickerson et al. (1990). In thismethod, PCR is used to achieve the exponential amplification of targetDNA, which is then detected using OLA.

Other amplification methods which are particularly suited for thedetection of single nucleotide polymorphism include LCR (ligase chainreaction), Gap LCR (GLCR) which are described above in “Amplification ofthe APM1 gene”. LCR uses two pairs of probes to exponentially amplify aspecific target. The sequences of each pair of oligonucleotides, isselected to permit the pair to hybridize to abutting sequences of thesame strand of the target. Such hybridization forms a substrate for atemplate-dependant ligase. In accordance with the present invention, LCRcan be performed with oligonucleotides having the proximal and distalsequences of the same strand of a biallelic marker site. In oneembodiment, either oligonucleotide will be designed to include thebiallelic marker site. In such an embodiment, the reaction conditionsare selected such that the oligonucleotides can be ligated together onlyif the target molecule either contains or lacks the specific nucleotidethat is complementary to the biallelic marker on the oligonucleotide. Inan alternative embodiment, the oligonucleotides will not include thebiallelic marker, such that when they hybridize to the target molecule,a “gap” is created as described in WO 90/01069. This gap is then“filled” with complementary dNTPs (as mediated by DNA polymerase), or byan additional pair of oligonucleotides. Thus at the end of each cycle,each single strand has a complement capable of serving as a targetduring the next cycle and exponential allele-specific amplification ofthe desired sequence is obtained.

Ligase/Polymerase-mediated Genetic Bit Analysis™ is another method fordetermining the identity of a nucleotide at a preselected site in anucleic acid molecule (WO 95/21271). This method involves theincorporation of a nucleoside triphosphate that is complementary to thenucleotide present at the preselected site onto the terminus of a primermolecule, and their subsequent ligation to a second oligonucleotide. Thereaction is monitored by detecting a specific label attached to thereaction's solid phase or by detection in solution.

4) Hybridization Assay Methods

A preferred method of determining the identity of the nucleotide presentat a biallelic marker site involves nucleic acid hybridization. Thehybridization probes, which can be conveniently used in such reactions,preferably include the probes defined herein. Any hybridization assaymay be used including Southern hybridization, Northern hybridization,dot blot hybridization and solid-phase hybridization (see Sambrook etal., 1989).

Hybridization refers to the formation of a duplex structure by twosingle stranded nucleic acids due to complementary base pairing.Hybridization can occur between exactly complementary nucleic acidstrands or between nucleic acid strands that contain minor regions ofmismatch. Specific probes can be designed that hybridize to one form ofa biallelic marker and not to the other and therefore are able todiscriminate between different allelic forms. Allele-specific probes areoften used in pairs, one member of a pair showing perfect match to atarget sequence containing the original allele and the other showing aperfect match to the target sequence containing the alternative allele.Hybridization conditions should be sufficiently stringent that there isa significant difference in hybridization intensity between alleles, andpreferably an essentially binary response, whereby a probe hybridizes toonly one of the alleles. Stringent, sequence specific hybridizationconditions, under which a probe will hybridize only to the exactlycomplementary target sequence are well known in the art (Sambrook etal., 1989). Stringent conditions are sequence dependent and will bedifferent in different circumstances. Generally, stringent conditionsare selected to be about 5° C. lower than the thermal melting point (Tm)for the specific sequence at a defined ionic strength and pH. Althoughsuch hybridizations can be performed in solution, it is preferred toemploy a solid-phase hybridization assay. The target DNA comprising abiallelic marker of the present invention may be amplified prior to thehybridization reaction. The presence of a specific allele in the sampleis determined by detecting the presence or the absence of stable hybridduplexes formed between the probe and the target DNA. The detection ofhybrid duplexes can be carried out by a number of methods. Variousdetection assay formats are well known which utilize detectable labelsbound to either the target or the probe to enable detection of thehybrid duplexes. Typically, hybridization duplexes are separated fromunhybridized nucleic acids and the labels bound to the duplexes are thendetected. Those skilled in the art will recognize that wash steps may beemployed to wash away excess target DNA or probe as well as unboundconjugate. Further, standard heterogeneous assay formats are suitablefor detecting the hybrids using the labels present on the primers andprobes.

Two recently developed assays allow hybridization-based allelediscrimination with no need for separations or washes (see Landegren U.et al., 1998). The TaqMan assay takes advantage of the 5′ nucleaseactivity of Taq DNA polymerase to digest a DNA probe annealedspecifically to the accumulating amplification product. TaqMan probesare labeled with a donor-acceptor dye pair that interacts viafluorescence energy transfer. Cleavage of the TaqMan probe by theadvancing polymerase during amplification dissociates the donor dye fromthe quenching acceptor dye, greatly increasing the donor fluorescence.All reagents necessary to detect two allelic variants can be assembledat the beginning of the reaction and the results are monitored in realtime (see Livak et al., 1995). In an alternative homogeneoushybridization based procedure, molecular beacons are used for allelediscriminations. Molecular beacons are hairpin-shaped oligonucleotideprobes that report the presence of specific nucleic acids in homogeneoussolutions. When they bind to their targets they undergo a conformationalreorganization that restores the fluorescence of an internally quenchedfluorophore (Tyagi et al., 1998).

The polynucleotides provided herein can be used to produce probes whichcan be used in hybridization assays for the detection of biallelicmarker alleles in biological samples. These probes are characterized inthat they preferably comprise between 8 and 50 nucleotides, and in thatthey are sufficiently complementary to a sequence comprising a biallelicmarker of the present invention to hybridize thereto and preferablysufficiently specific to be able to discriminate the targeted sequencefor only one nucleotide variation. A particularly preferred probe is 25nucleotides in length. Preferably the biallelic marker is within 4nucleotides of the center of the polynucleotide probe. In particularlypreferred probes, the biallelic marker is at the center of saidpolynucleotide. Preferred probes comprise a nucleotide sequence selectedfrom the group consisting of Nos 9-27, 99-14387, 9-12, 9-13, 99-14405,and 9-16 and the sequences complementary thereto, or a fragment thereof,said fragment comprising at least about 8 consecutive nucleotides,preferably 10, 15, 20, more preferably 25, 30, 40, 47, or 50 consecutivenucleotides and containing a polymorphic base. In preferred embodimentsthe polymorphic base is within 5, 4, 3, 2, 1, nucleotides of the centerof the said polynucleotide, more preferably at the center of saidpolynucleotide.

Preferably the probes of the present invention are labeled orimmobilized on a solid support. Labels and solid supports are furtherdescribed in “Oligonucleotide Probes and Primers”. The probes can benon-extendable as described in “Oligonucleotide Probes and Primers”.

By assaying the hybridization to an allele specific probe, one candetect the presence or absence of a biallelic marker allele in a givensample. High-Throughput parallel hybridizations in array format arespecifically encompassed within “hybridization assays” and are describedbelow.

5) Hybridization to Addressable Arrays of Oligonucleotides

Hybridization assays based on oligonucleotide arrays rely on thedifferences in hybridization stability of short oligonucleotides toperfectly matched and mismatched target sequence variants. Efficientaccess to polymorphism information is obtained through a basic structurecomprising high-density arrays of oligonucleotide probes attached to asolid support (e.g., the chip) at selected positions. Each DNA chip cancontain thousands to millions of individual synthetic DNA probesarranged in a grid-like pattern and miniaturized to the size of a dime.

The chip technology has already been applied with success in numerouscases. For example, the screening of mutations has been undertaken inthe BRCA1 gene, in S. cerevisiae mutant strains, and in the proteasegene of HIV-1 virus (Hacia et al., 1996; Shoemaker et al., 1996; Kozalet al., 1996). Chips of various formats for use in detecting biallelicpolymorphisms can be produced on a customized basis by Affymetrix(GeneChip™), Hyseq (HyChip and HyGnostics), and Protogene Laboratories.

In general, these methods employ arrays of oligonucleotide probes thatare complementary to target nucleic acid sequence segments from anindividual which, target sequences include a polymorphic marker. EP785280 describes a tiling strategy for the detection of singlenucleotide polymorphisms. Briefly, arrays may generally be “tiled” for alarge number of specific polymorphisms. By “tiling” is generally meantthe synthesis of a defined set of oligonucleotide probes which is madeup of a sequence complementary to the target sequence of interest, aswell as preselected variations of that sequence, e.g., substitution ofone or more given positions with one or more members of the basis set ofmonomers, i.e. nucleotides. Tiling strategies are further described inPCT application No. WO 95/11995. In a particular aspect, arrays aretiled for a number of specific, identified biallelic marker sequences.In particular, the array is tiled to include a number of detectionblocks, each detection block being specific for a specific biallelicmarker or a set of biallelic markers. For example, a detection block maybe tiled to include a number of probes, which span the sequence segmentthat includes a specific polymorphism. To ensure probes that arecomplementary to each allele, the probes are synthesized in pairsdiffering at the biallelic marker. In addition to the probes differingat the polymorphic base, monosubstituted probes are also generally tiledwithin the detection block. These monosubstituted probes have bases atand up to a certain number of bases in either direction from thepolymorphism, substituted with the remaining nucleotides (selected fromA, T, G, C and U). Typically the probes in a tiled detection block willinclude substitutions of the sequence positions up to and includingthose that are 5 bases away from the biallelic marker. Themonosubstituted probes provide internal controls for the tiled array, todistinguish actual hybridization from artefactual cross-hybridization.Upon completion of hybridization with the target sequence and washing ofthe array, the array is scanned to determine the position on the arrayto which the target sequence hybridizes. The hybridization data from thescanned array is then analyzed to identify which allele or alleles ofthe biallelic marker are present in the sample. Hybridization andscanning may be carried out as described in PCT application No. WO92/10092 and WO 95/11995 and U.S. Pat. No. 5,424,186.

Thus, in some embodiments, the chips may comprise an array of nucleicacid sequences of fragments of about 15 nucleotides in length. Infurther embodiments, the chip may comprise an array including at leastone of the sequences selected from the group consisting of 9-27,99-14387, 9-12, 9-13, 99-14405, and 9-16 and the sequences complementarythereto, or a fragment thereof, said fragment comprising at least about8 consecutive nucleotides, preferably 10, 15, 20, more preferably 25,30, 40, 47, or 50 consecutive nucleotides and containing a polymorphicbase. In preferred embodiments the polymorphic base is within 5, 4, 3,2, 1, nucleotides of the center of the said polynucleotide, morepreferably at the center of said polynucleotide. In some embodiments,the chip may comprise an array of at least 2, 3, 4, 5, 6, 7, 8 or moreof these polynucleotides of the invention. Solid supports andpolynucleotides of the present invention attached to solid supports arefurther described in “oligonucleotide probes and primers”.

6) Integrated Systems

Another technique, which may be used to analyze polymorphisms, includesmulticomponent integrated systems, which miniaturize andcompartmentalize processes such as PCR and capillary electrophoresisreactions in a single functional device. An example of such technique isdisclosed in U.S. Pat. No. 5,589,136, which describes the integration ofPCR amplification and capillary electrophoresis in chips.

Integrated systems can be envisaged mainly when microfluidic systems areused. These systems comprise a pattern of microchannels designed onto aglass, silicon, quartz, or plastic wafer included on a microchip. Themovements of the samples are controlled by electric, electroosmotic orhydrostatic forces applied across different areas of the microchip tocreate functional microscopic valves and pumps with no moving parts.Varying the voltage controls the liquid flow at intersections betweenthe micro-machined channels and changes the liquid flow rate for pumpingacross different sections of the microchip.

For genotyping biallelic markers, the microfluidic system may integratenucleic acid amplification, microsequencing, capillary electrophoresisand a detection method such as laser-induced fluorescence detection.

In a first step, the DNA samples are amplified, preferably by PCR. Then,the amplification products are subjected to automated microsequencingreactions using ddNTPs (specific fluorescence for each ddNTP) and theappropriate oligonucleotide microsequencing primers which hybridize justupstream of the targeted polymorphic base. Once the extension at the 3′end is completed, the primers are separated from the unincorporatedfluorescent ddNTPs by capillary electrophoresis. The separation mediumused in capillary electrophoresis can for example be polyacrylamide,polyethyleneglycol or dextran. The incorporated ddNTPs in thesingle-nucleotide primer extension products are identified byfluorescence detection. This microchip can be used to process at least96 to 384 samples in parallel. It can use the usual four color laserinduced fluorescence detection of the ddNTPs.

Methods of Genetic Analysis Using the Biallelic Markers of the PresentInvention

Different methods are available for the genetic analysis of complextraits (see Lander and Schork, 1994). The search fordisease-susceptibility genes is conducted using two main methods: thelinkage approach in which evidence is sought for cosegregation between alocus and a putative trait locus using family studies, and theassociation approach in which evidence is sought for a statisticallysignificant association between an allele and a trait or a trait causingallele (Khoury et al., 1993). In general, the biallelic markers of thepresent invention find use in any method known in the art to demonstratea statistically significant correlation between a genotype and aphenotype. The biallelic markers may be used in parametric andnon-parametric linkage analysis methods. Preferably, the biallelicmarkers of the present invention are used to identify genes associatedwith detectable traits using association studies, an approach which doesnot require the use of affected families and which permits theidentification of genes associated with complex and sporadic traits.

The genetic analysis using the biallelic markers of the presentinvention may be conducted on any scale. The whole set of biallelicmarkers of the present invention or any subset of biallelic markers ofthe present invention corresponding to the candidate gene may be used.Further, any set of genetic markers including a biallelic marker of thepresent invention may be used. A set of biallelic polymorphisms thatcould be used as genetic markers in combination with the biallelicmarkers of the present invention has been described in WO 98/20165. Asmentioned above, it should be noted that the biallelic markers of thepresent invention may be included in any complete or partial genetic mapof the human genome. These different uses are specifically contemplatedin the present invention and claims.

The invention also comprises methods of detecting an association betweena genotype and a phenotype, comprising the steps of a) genotyping atleast one APM1-related biallelic marker in a trait positive populationaccording to a genotyping method of the invention; b) genotyping saidAPM1-related biallelic marker in a control population according to agenotyping method of the invention; and c) determining whether astatistically significant association exists between said genotype andsaid phenotype. In addition, the methods of detecting an associationbetween a genotype and a phenotype of the invention encompass methodswith any further limitation described in this disclosure, or thosefollowing, specified alone or in any combination. Optionally, saidAPM1-related biallelic marker may be selected from the group consistingof A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13, A14, A15,A16, A17, A18, A19, A20, A21, A22, A23, A24, A25 and A26, and thecomplements thereof; optionally, wherein said APM1-related biallelicmarker is selected from the group consisting of A1, A2, and A7 or thegroup consisting of A4 and A8. Optionally, said control population maybe a trait negative population, or a random population. Optionally, eachof said genotyping steps a) and b) may be performed on a pooledbiological sample derived from each of said populations. Optionally,each of said genotyping of steps a) and b) is performed separately onbiological samples derived from each individual in said population or asubsample thereof. Optionally, said phenotype is obesity or disordersrelated to obesity. Optionally, wherein said disorder related to obesityis selected from the group consisting of atherosclerosis, insulinresistance, hypertension, hyperlipidemia, hypertriglyceridemia,cardiovascular disease, microangiopathic in obese individuals with TypeII diabetes, ocular lesions associated with microangiopathy in obeseindividuals with Type II diabetes, renal lesions associated withmicroangiopathy in obese individuals with Type II diabetes, and SyndromeX.

The invention also encompasses methods of estimating the frequency of ahaplotype for a set of biallelic markers in a population, comprising thesteps of: a) genotyping at least two APM1-related biallelic marker foreach individual in said population or a subsample thereof, according toa genotyping method of the invention; and b) applying a haplotypedetermination method to the identities of the nucleotides determined insteps a) to obtain an estimate of said frequency. In addition, themethods of estimating the frequency of a haplotype of the inventionencompass methods with any further limitation described in thisdisclosure, or those following, specified alone or in any combination:Optionally, said biallelic marker may be selected from the groupconsisting of A2, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13,A14, A15, A16, A17, A18, A19, A20, A26, and the complements thereof;optionally, wherein said APM1-related biallelic marker is selected fromthe group consisting of A1, A2, and A7 or the group consisting of A4 andA8. Optionally, said haplotype determination method is performed byasymmetric PCR amplification, double PCR amplification of specificalleles, the Clark algorithm, or an expectation-maximization algorithm.

An additional embodiment of the present invention encompasses methods ofdetecting an association between a haplotype and a phenotype, comprisingthe steps of: a) estimating the frequency of at least one haplotype in atrait positive population, according to a method of the invention forestimating the frequency of a haplotype; b) estimating the frequency ofsaid haplotype in a control population, according to a method of theinvention for estimating the frequency of a haplotype; and c)determining whether a statistically significant association existsbetween said haplotype and said phenotype. In addition, the methods ofdetecting an association between a haplotype and a phenotype of theinvention encompass methods with any further limitation described inthis disclosure, or those following. Optionally, said biallelic markermay be selected from the group consisting of A1, A2, A3, A4, A5, A6, A7,A8, A9, A10, A11, A12, A13, A14, A15, A16, A17, A18, A19, A20, A21, A22,A23, A24, A25 and A26, and the complements thereof; optionally, whereinsaid APM1-related biallelic marker is selected from the group consistingof A1, A2, and A7 or the group consisting of A4 and A8. Optionally, saidcontrol population is a trait negative population, or a randompopulation. Optionally, said phenotype is obesity or a disorder relatedto obesity. Optionally, said method comprises the additional steps ofdetermining the phenotype in said trait positive and said controlpopulations prior to step c). Optionally, wherein said disorder relatedto obesity is selected from the group consisting of atherosclerosis,insulin resistance, hypertension, hyperlipidemia, hypertriglyceridemia,cardiovascular disease, microangiopathic in obese individuals with TypeII diabetes, ocular lesions associated with microangiopathy in obeseindividuals with Type II diabetes, renal lesions associated withmicroangiopathy in obese individuals with Type II diabetes, and SyndromeX.

Linkage Analysis

Linkage analysis is based upon establishing a correlation between thetransmission of genetic markers and that of a specific trait throughoutgenerations within a family. Thus, the aim of linkage analysis is todetect marker loci that show cosegregation with a trait of interest inpedigrees.

Parametric Methods

When data are available from successive generations there is theopportunity to study the degree of linkage between pairs of loci.Estimates of the recombination fraction enable loci to be ordered andplaced onto a genetic map. With loci that are genetic markers, a geneticmap can be established, and then the strength of linkage between markersand traits can be calculated and used to indicate the relative positionsof markers and genes affecting those traits (Weir, 1996). The classicalmethod for linkage analysis is the logarithm of odds (lod) score method(see Morton, 1955; Ott, 1991). Calculation of lod scores requiresspecification of the mode of inheritance for the disease (parametricmethod). Generally, the length of the candidate region identified usinglinkage analysis is between 2 and 20 Mb. Once a candidate region isidentified as described above, analysis of recombinant individuals usingadditional markers allows further delineation of the candidate region.Linkage analysis studies have generally relied on the use of a maximumof 5,000 microsatellite markers, thus limiting the maximum theoreticalattainable resolution of linkage analysis to about 600 kb on average.

Linkage analysis has been successfully applied to map simple genetictraits that show clear Mendelian inheritance patterns and which have ahigh penetrance (i.e., the ratio between the number of trait positivecarriers of allele a and the total number of a carriers in thepopulation). However, parametric linkage analysis suffers from a varietyof drawbacks. First, it is limited by its reliance on the choice of agenetic model suitable for each studied trait. Furthermore, as alreadymentioned, the resolution attainable using linkage analysis is limited,and complementary studies are required to refine the analysis of thetypical 2 Mb to 20 Mb regions initially identified through linkageanalysis. In addition, parametric linkage analysis approaches haveproven difficult when applied to complex genetic traits, such as thosedue to the combined action of multiple genes and/or environmentalfactors. It is very difficult to model these factors adequately in a lodscore analysis. In such cases, too large an effort and cost are neededto recruit the adequate number of affected families required forapplying linkage analysis to these situations, as recently discussed byRisch, N. and Merikangas, K. (1996).

Non-Parametric Methods

The advantage of the so-called non-parametric methods for linkageanalysis is that they do not require specification of the mode ofinheritance for the disease, they tend to be more useful for theanalysis of complex traits. In non-parametric methods, one tries toprove that the inheritance pattern of a chromosomal region is notconsistent with random Mendelian segregation by showing that affectedrelatives inherit identical copies of the region more often thanexpected by chance. Affected relatives should show excess “allelesharing” even in the presence of incomplete penetrance and polygenicinheritance. In non-parametric linkage analysis the degree of agreementat a marker locus in two individuals can be measured either by thenumber of alleles identical by state (IBS) or by the number of allelesidentical by descent (IBD). Affected sib pair analysis is a well-knownspecial case and is the simplest form of these methods.

The biallelic markers of the present invention may be used in bothparametric and non-parametric linkage analysis. Preferably biallelicmarkers may be used in non-parametric methods which allow the mapping ofgenes involved in complex traits. The biallelic markers of the presentinvention may be used in both IBD- and IBS-methods to map genesaffecting a complex trait. In such studies, taking advantage of the highdensity of biallelic markers, several adjacent biallelic marker loci maybe pooled to achieve the efficiency attained by multi-allelic markers(Zhao et al., 1998).

Population Association Studies

The present invention comprises methods for identifying if the APM1 geneis associated with a detectable trait using the biallelic markers of thepresent invention. In one embodiment the present invention comprisesmethods to detect an association between a biallelic marker allele or abiallelic marker haplotype and a trait. Further, the invention comprisesmethods to identify a trait causing allele in linkage disequilibriumwith any biallelic marker allele of the present invention.

As described above, alternative approaches can be employed to performassociation studies: genome-wide association studies, candidate regionassociation studies and candidate gene association studies. In apreferred embodiment, the biallelic markers of the present invention areused to perform candidate gene association studies. The candidate geneanalysis clearly provides a short-cut approach to the identification ofgenes and gene polymorphisms related to a particular trait when someinformation concerning the biology of the trait is available. Further,the biallelic markers of the present invention may be incorporated inany map of genetic markers of the human genome in order to performgenome-wide association studies. Methods to generate a high-density mapof biallelic markers has been described in U.S. Provisional Patentapplication Ser. No. 60/082,614. The biallelic markers of the presentinvention may further be incorporated in any map of a specific candidateregion of the genome (a specific chromosome or a specific chromosomalsegment for example).

As mentioned above, association studies may be conducted within thegeneral population and are not limited to studies performed on relatedindividuals in affected families. Association studies are extremelyvaluable as they permit the analysis of sporadic or multifactor traits.Moreover, association studies represent a powerful method for fine-scalemapping enabling much finer mapping of trait causing alleles thanlinkage studies. Studies based on pedigrees often only narrow thelocation of the trait causing allele. Association studies using thebiallelic markers of the present invention can therefore be used torefine the location of a trait causing allele in a candidate regionidentified by Linkage Analysis methods. Moreover, once a chromosomesegment of interest has been identified, the presence of a candidategene such as a candidate gene of the present invention, in the region ofinterest can provide a shortcut to the identification of the traitcausing allele. Biallelic markers of the present invention can be usedto demonstrate that a candidate gene is associated with a trait. Suchuses are specifically contemplated in the present invention.

Determining the Frequency of a Biallelic Marker Allele or of a BiallelicMarker Haplotype in a Population

Association studies explore the relationships among frequencies for setsof alleles between loci.

Determining the Frequency of an Allele in a Population

Allelic frequencies of the biallelic markers in a populations can bedetermined using one of the methods described above under the heading“Methods for genotyping an individual for biallelic markers”, or anygenotyping procedure suitable for this intended purpose. Genotypingpooled samples or individual samples can determine the frequency of abiallelic marker allele in a population. One way to reduce the number ofgenotypings required is to use pooled samples. A major obstacle in usingpooled samples is in terms of accuracy and reproducibility fordetermining accurate DNA concentrations in setting up the pools.Genotyping individual samples provides higher sensitivity,reproducibility and accuracy and; is the preferred method used in thepresent invention. Preferably, each individual is genotyped separatelyand simple gene counting is applied to determine the frequency of anallele of a biallelic marker or of a genotype in a given population.

Determining the Frequency of a Haplotype in a Population

The gametic phase of haplotypes is unknown when diploid individuals areheterozygous at more than one locus. Using genealogical information infamilies gametic phase can sometimes be inferred (Perlin et al., 1994).When no genealogical information is available different strategies maybe used. One possibility is that the multiple-site heterozygous diploidscan be eliminated from the analysis, keeping only the homozygotes andthe single-site heterozygote individuals, but this approach might leadto a possible bias in the sample composition and the underestimation oflow-frequency haplotypes. Another possibility is that single chromosomescan be studied independently, for example, by asymmetric PCRamplification (see Newton et al, 1989; Wu et al., 1989) or by isolationof single chromosome by limit dilution followed by PCR amplification(see Ruano et al., 1990). Further, a sample may be haplotyped forsufficiently close biallelic markers by double PCR amplification ofspecific alleles (Sarkar, G. and Sommer S. S., 1991). These approachesare not entirely satisfying either because of their technicalcomplexity, the additional cost they entail, their lack ofgeneralization at a large scale, or the possible biases they introduce.To overcome these difficulties, an algorithm to infer the phase ofPCR-amplified DNA genotypes introduced by Clark, A. G.(1990) may beused. Briefly, the principle is to start filling a preliminary list ofhaplotypes present in the sample by examining unambiguous individuals,that is, the complete homozygotes and the single-site heterozygotes.Then other individuals in the same sample are screened for the possibleoccurrence of previously recognized haplotypes. For each positiveidentification, the complementary haplotype is added to the list ofrecognized haplotypes, until the phase information for all individualsis either resolved or identified as unresolved. This method assigns asingle haplotype to each multiheterozygous individual, whereas severalhaplotypes are possible when there are more than one heterozygous site.Alternatively, one can use methods estimating haplotype frequencies in apopulation without assigning haplotypes to each individual. Preferably,a method based on an expectation-maximization (EM) algorithm (Dempsteret al., 1977) leading to maximum-likelihood estimates of haplotypefrequencies under the assumption of Hardy-Weinberg proportions (randommating) is used (see Excoffier L. and Slatkin M., 1995). The EMalgorithm is a generalized iterative maximum-likelihood approach toestimation that is useful when data are ambiguous and/or incomplete. TheEM algorithm is used to resolve heterozygotes into haplotypes. Haplotypeestimations are further described below under the heading “StatisticalMethods.” Any other method known in the art to determine or to estimatethe frequency of a haplotype in a population may be used.

Linkage Disequilibrium Analysis

Linkage disequilibrium is the non-random association of alleles at twoor more loci and represents a powerful tool for mapping genes involvedin disease traits (see Ajioka R. S. et al., 1997). Biallelic markers,because they are densely spaced in the human genome and can be genotypedin greater numbers than other types of genetic markers (such as RFLP orVNTR markers), are particularly useful in genetic analysis based onlinkage disequilibrium.

When a disease mutation is first introduced into a population (by a newmutation or the immigration of a mutation carrier), it necessarilyresides on a single chromosome and thus on a single “background” or“ancestral” haplotype of linked markers. Consequently, there is completedisequilibrium between these markers and the disease mutation: one findsthe disease mutation only in the presence of a specific set of markeralleles. Through subsequent generations recombination events occurbetween the disease mutation and these marker polymorphisms, and thedisequilibrium gradually dissipates. The pace of this dissipation is afunction of the recombination frequency, so the markers closest to thedisease gene will manifest higher levels of disequilibrium than thosethat are further away. When not broken up by recombination, “ancestral”haplotypes and linkage disequilibrium between marker alleles atdifferent loci can be tracked not only through pedigrees but alsothrough populations. Linkage disequilibrium is usually seen as anassociation between one specific allele at one locus and anotherspecific allele at a second locus.

The pattern or curve of disequilibrium between disease and marker lociis expected to exhibit a maximum that occurs at the disease locus.Consequently, the amount of linkage disequilibrium between a diseaseallele and closely linked genetic markers may yield valuable informationregarding the location of the disease gene. For fine-scale mapping of adisease locus, it is useful to have some knowledge of the patterns oflinkage disequilibrium that exist between markers in the studied region.As mentioned above the mapping resolution achieved through the analysisof linkage disequilibrium is much higher than that of linkage studies.The high density of biallelic markers combined with linkagedisequilibrium analysis provides powerful tools for fine-scale mapping.Different methods to calculate linkage disequilibrium are describedbelow under the heading “Statistical Methods”.

Population-Based Case-Control Studies of Trait-Marker Associations

As mentioned above, the occurrence of pairs of specific alleles atdifferent loci on the same chromosome is not random and the deviationfrom random is called linkage disequilibrium. Association studies focuson population frequencies and rely on the phenomenon of linkagedisequilibrium. If a specific allele in a given gene is directlyinvolved in causing a particular trait, its frequency will bestatistically increased in an affected (trait positive) population, whencompared to the frequency in a trait negative population or in a randomcontrol population. As a consequence of the existence of linkagedisequilibrium, the frequency of all other alleles present in thehaplotype carrying the trait-causing allele will also be increased intrait positive individuals compared to trait negative individuals orrandom controls. Therefore, association between the trait and any allele(specifically a biallelic marker allele) in linkage disequilibrium withthe trait-causing allele will suffice to suggest the presence of atrait-related gene in that particular region. Case-control populationscan be genotyped for biallelic markers to identify associations thatnarrowly locate a trait causing allele. As any marker in linkagedisequilibrium with one given marker associated with a trait will beassociated with the trait. Linkage disequilibrium allows the relativefrequencies in case-control populations of a limited number of geneticpolymorphisms (specifically biallelic markers) to be analyzed as analternative to screening all possible functional polymorphisms in orderto find trait-causing alleles. Association studies compare the frequencyof marker alleles in unrelated case-control populations, and representpowerful tools for the dissection of complex traits.

Case-Control Populations (Inclusion Criteria)

Population-based association studies do not concern familial inheritancebut compare the prevalence of a particular genetic marker, or a set ofmarkers, in case-control populations. They are case-control studiesbased on comparison of unrelated case (affected or trait positive)individuals and unrelated control (unaffected, trait negative or random)individuals. Preferably the control group is composed of unaffected ortrait negative individuals. Further, the control group is ethnicallymatched to the case population. Moreover, the control group ispreferably matched to the case-population for the main known confusionfactor for the trait under study (for example age-matched for anage-dependent trait). Ideally, individuals in the two samples are pairedin such a way that they are expected to differ only in their diseasestatus. The terms “trait positive population”, “case population” and“affected population” are used interchangeably herein.

An important step in the dissection of complex traits using associationstudies is the choice of case-control populations (see Lander andSchork, 1994). A major step in the choice of case-control populations isthe clinical definition of a given trait or phenotype. Any genetic traitmay be analyzed by the association method proposed here by carefullyselecting the individuals to be included in the trait positive and traitnegative phenotypic groups. Four criteria are often useful: clinicalphenotype, age at onset, family history and severity. The selectionprocedure for continuous or quantitative traits (such as blood pressurefor example) involves selecting individuals at opposite ends of thephenotype distribution of the trait under study, so as to include inthese trait positive and trait negative populations individuals withnon-overlapping phenotypes. Preferably, case-control populations consistof phenotypically homogeneous populations. Trait positive and traitnegative populations consist of phenotypically uniform populations ofindividuals representing each between 1 and 98%, preferably between 1and 80%, more preferably between 1 and 50%, and more preferably between1 and 30%, most preferably between 1 and 20% of the total populationunder study, and preferably selected among individuals exhibitingnon-overlapping phenotypes. The clearer the difference between the twotrait phenotypes, the greater the probability of detecting anassociation with biallelic markers. The selection of those drasticallydifferent but relatively uniform phenotypes enables efficientcomparisons in association studies and the possible detection of markeddifferences at the genetic level, provided that the sample sizes of thepopulations under study are significant enough.

In preferred embodiments, a first group of between 50 and 300 traitpositive individuals, preferably about 100 individuals, are recruitedaccording to their phenotypes. A similar number of trait negativeindividuals are included in such studies.

In the present invention, typical examples of inclusion criteria includeobesity and disorders related to obesity.

Association Analysis

The general strategy to perform association studies using biallelicmarkers derived from a region carrying a candidate gene is to scan twogroups of individuals (case-control populations) in order to measure andstatistically compare the allele frequencies of the biallelic markers ofthe present invention in both groups.

If a statistically significant association with a trait is identifiedfor at least one or more of the analyzed biallelic markers, one canassume that: either the associated allele is directly responsible forcausing the trait (i.e. the associated allele is the trait causingallele), or more likely the associated allele is in linkagedisequilibrium with the trait causing allele. The specificcharacteristics of the associated allele with respect to the candidategene function usually give further insight into the relationship betweenthe associated allele and the trait (causal or in linkagedisequilibrium). If the evidence indicates that the associated allelewithin the candidate gene is most probably not the trait causing allelebut is in linkage disequilibrium with the real trait causing allele,then the trait causing allele can be found by sequencing the vicinity ofthe associated marker, and performing further association studies withthe polymorphisms that are revealed in an iterative manner.

Association studies are usually run in two successive steps. In a firstphase, the frequencies of a reduced number of biallelic markers from thecandidate gene are determined in the trait positive and trait negativepopulations. In a second phase of the analysis, the position of thegenetic loci responsible for the given trait is further refined using ahigher density of markers from the relevant region. However, if thecandidate gene under study is relatively small in length, as is the casefor APM1, a single phase may be sufficient to establish significantassociations.

Haplotype Analysis

As described above, when a chromosome carrying a disease allele firstappears in a population as a result of either mutation or migration, themutant allele necessarily resides on a chromosome having a set of linkedmarkers: the ancestral haplotype. This haplotype can be tracked throughpopulations and its statistical association with a given trait can beanalyzed. Complementing single point (allelic) association studies withmulti-point association studies also called haplotype studies increasesthe statistical power of association studies. Thus, a haplotypeassociation study allows one to define the frequency and the type of theancestral carrier haplotype. A haplotype analysis is important in thatit increases the statistical power of an analysis involving individualmarkers.

In a first stage of a haplotype frequency analysis, the frequency of thepossible haplotypes based on various combinations of the identifiedbiallelic markers of the invention is determined. The haplotypefrequency is then compared for distinct populations of trait positiveand control individuals. The number of trait positive individuals, whichshould be, subjected to this analysis to obtain statisticallysignificant results usually ranges between 30 and 300, with a preferrednumber of individuals ranging between 50 and 150. The sameconsiderations apply to the number of unaffected individuals (or randomcontrol) used in the study. The results of this first analysis providehaplotype frequencies in case-control populations, for each evaluatedhaplotype frequency a p-value and an odd ratio are calculated. If astatistically significant association is found the relative risk for anindividual carrying the given haplotype of being affected with the traitunder study can be approximated.

Interaction Analysis

The biallelic markers of the present invention may also be used toidentify patterns of biallelic markers associated with detectable traitsresulting from polygenic interactions. The analysis of geneticinteraction between alleles at unlinked loci requires individualgenotyping using the techniques described herein. The analysis ofallelic interaction among a selected set of biallelic markers withappropriate level of statistical significance can be considered as ahaplotype analysis. Interaction analysis consists in stratifying thecase-control populations with respect to a given haplotype for the firstloci and performing a haplotype analysis with the second loci with eachsubpopulation.

Statistical Methods Used in Association Studies.

Testing for Linkage in the Presence of Association

The biallelic markers of the present invention may further be used inTDT (transmission/disequilibrium test). TDT tests for both linkage andassociation and is not affected by population stratification. TDTrequires data for affected individuals and their parents or data fromunaffected sibs instead of from parents (see Spielmann S. et al., 1993;Schaid D. J. et al., 1996, Spielmann S. and Ewens W. J., 1998). Suchcombined tests generally reduce the false-positive errors produced byseparate analyses.

Statistical Methods

In general, any method known in the art to test whether a trait and agenotype show a statistically significant correlation may be used.

1) Methods in Linkage Analysis

Statistical methods and computer programs useful for linkage analysisare well-known to those skilled in the art (see Terwilliger J. D. andOtt J., 1994; Ott J., 1991).

2) Methods to Estimate Haplotype Frequencies in a Population

As described above, when genotypes are scored, it is often not possibleto distinguish heterozygotes so that haplotype frequencies cannot beeasily inferred. When the gametic phase is not known, haplotypefrequencies can be estimated from the multilocus genotypic data. Anymethod known to person skilled in the art can be used to estimatehaplotype frequencies (see Lange K., 1997; Weir, B. S., 1996)Preferably, maximum-likelihood haplotype frequencies are computed usingan Expectation-Maximization (EM) algorithm (see Dempster et al., 1977;Excoffier L. and Slatkin M., 1995). This procedure is an iterativeprocess aiming at obtaining maximum-likelihood estimates of haplotypefrequencies from multi-locus genotype data when the gametic phase isunknown. Haplotype estimations are usually performed by applying the EMalgorithm using for example the EM-HAPLO program (Hawley M. E. et al.,1994) or the Arlequin program (Schneider et al., 1997). The EM algorithmis a generalized iterative maximum likelihood approach to estimation andis briefly described below.

Please note that in the present section, “Methods To Estimate HaplotypeFrequencies In A Population,” of this text, phenotypes will refer tomulti-locus genotypes with unknown phase. Genotypes will refer toknown-phase multi-locus genotypes.

A sample of N unrelated individuals is typed for K markers. The dataobserved are the unknown-phase K-locus phenotypes that can categorizedin F different phenotypes. Suppose that we have H underlying possiblehaplotypes (in case of K biallelic markers, H=2^(K)).

For phenotype j, suppose that c_(j) genotypes are possible. We thus havethe following equation $\begin{matrix}{P_{j} = {{\sum\limits_{i = 1}^{c_{j}}{{pr}\left( {genotype}_{i} \right)}} = {\sum\limits_{i = 1}^{c_{j}}{{pr}\left( {h_{k},h_{l}} \right)}}}} & \underset{\_}{{Equation}\quad 1}\end{matrix}$

where Pj is the probability of the phenotype j, h_(k) and h_(l) are thetwo haplotypes constituent the genotype i. Under the Hardy-Weinbergequilibrium, pr(h_(k),h_(l)) becomes:pr(h _(k) , hl)=pr(h _(k))² if h _(k) =h _(l) , pr(h _(k) , h_(l))=2pr(h _(k)).pr(h _(l)) if h _(k) ≠h _(l).   Equation 2

The successive steps of the E-M algorithm can be described as follows:

Starting with initial values of the of haplotypes frequencies, noted p₁⁽⁰⁾,p₂ ⁽⁰⁾, . . . p_(H) ⁽⁰⁾, these initial values serve to estimate thegenotype frequencies (Expectation step) and then estimate another set ofhaplotype frequencies (Maximization step), noted p₁ ⁽¹⁾, p₂ ⁽¹⁾, . . .p_(H) ⁽¹⁾, these two steps are iterated until changes in the sets ofhaplotypes frequency are very small.

A stop criterion can be that the maximum difference between haplotypefrequencies between two iterations is less than 10⁻⁷. These values canbe adjusted according to the desired precision of estimations.

At a given iteration s, the Expectation step consists in calculating thegenotypes frequencies by the following equation: $\begin{matrix}\begin{matrix}{{{pr}\left( {genotype}_{i} \right)}^{(s)} = {{{pr}\left( {phenotype}_{j} \right)} \cdot {{pr}\left( {{genotype}_{i}\text{❘}{phenotype}_{j}} \right)}^{(s)}}} \\{= {\frac{n_{j}}{N} \cdot \frac{{{pr}\left( {h_{k},h_{l}} \right)}^{(s)}}{P_{j}^{(s)}}}}\end{matrix} & \underset{\_}{{Equation}\quad 3}\end{matrix}$

where genotype i occurs in phenotype j, and where h_(k) and h_(l)constitute genotype i. Each probability is derived according to eq. 1,and eq. 2 described above.

Then the Maximization step simply estimates another set of haplotypefrequencies given the genotypes frequencies. This approach is also knownas the gene-counting method (Smith, 1957). $\begin{matrix}{p_{t}^{({s + 1})} = {\frac{1}{2}{\sum\limits_{j = 1}^{F}{\sum\limits_{i = 1}^{c_{j}}{\delta_{it} \cdot {{pr}\left( {genotype}_{i} \right)}^{(s)}}}}}} & \underset{\_}{{Equation}\quad 4}\end{matrix}$

Where δ_(it) is an indicator variable which count the number of timehaplotype t in genotype i. It takes the values of 0, 1 or 2.

To ensure that the estimation finally obtained is the maximum-likelihoodestimation several values of departures are required. The estimationsobtained are compared and if they are different the estimations leadingto the best likelihood are kept.

3) Methods to Calculate Linkage Disequilibrium Between Markers

A number of methods can be used to calculate linkage disequilibriumbetween any two genetic positions, in practice linkage disequilibrium ismeasured by applying a statistical association test to haplotype datataken from a population.

Linkage disequilibrium between any pair of biallelic markers comprisingat least one of the biallelic markers of the present invention (M_(i),M_(j)) having alleles (a_(i)/b_(i)) at marker M_(i) and alleles(a_(j)/b_(j)) at marker M_(j) can be calculated for every allelecombination (a_(i),a_(j); a_(i),b_(j); b_(i),a_(j) and b_(i),b_(j)),according to the Piazza formula:Δ_(aiaj)=√θ4−√(θ4+θ3)(θ4+θ2), where:

θ4=−−=frequency of genotypes not having allele a_(i) at M_(i) and nothaving allele a_(j) at M_(j)

θ3=∓=frequency of genotypes not having allele a_(i) at M_(i) and havingallele a_(j) at M_(j)

θ2=±=frequency of genotypes having allele a_(i) at M_(i) and not havingallele a_(j) at M_(j)

Linkage disequilibrium (LD) between pairs of biallelic markers (M_(i),M_(j)) can also be calculated for every allele combination (ai,aj;ai,bj; b_(i),a_(j) and b_(i),b_(j)), according to the maximum-likelihoodestimate (MLE) for delta (the composite genotypic disequilibriumcoefficient), as described by Weir (Weir B. S., 1996). The MLE for thecomposite linkage disequilibrium is:D _(aiaj)=(2n ₁ +n ₂ +n ₃ +n ₄/2)/N−2(pr(a _(i)). pr(a _(j)))

Where n₁=Σ phenotype (a_(i)/a_(i), a_(j)/a_(j)), n₂=Σ phenotype(a_(i)/a_(i), a_(j)/b_(j)), n₃=Z phenotype (a_(i)/b_(i), a_(j)/a_(j)),n4=Σ phenotype (a_(i)/b_(i), a_(j)/b_(j)) and N is the number ofindividuals in the sample.

This formula allows linkage disequilibrium between alleles to beestimated when only genotype, and not haplotype, data are available.

Another means of calculating the linkage disequilibrium between markersis as follows. For a couple of biallelic markers, M_(i) (a_(i)/b_(i))and M_(j) (a_(j)/b_(j)), fitting the Hardy-Weinberg equilibrium, one canestimate the four possible haplotype frequencies in a given populationaccording to the approach described above.

The estimation of gametic disequilibrium between ai and aj is simply:D _(aiaj) =pr(haplotype(a _(i) , a _(j)))−pr(a _(i)).pr(a _(j)).

Where pr(a_(i)) is the probability of allele a_(i) and pr(a_(j)) is theprobability of allele a_(j) and where pr(haplotype (a_(i), a_(j))) isestimated as in Equation 3 above.

For a couple of biallelic marker only one measure of disequilibrium isnecessary to describe the association between M_(i) and M_(j).

Then a normalized value of the above is calculated as follows:D′ _(aiaj) =D _(aiaj)/max(−pr(a _(i)). pr(a _(j)), −pr(b _(i)). pr(b_(j))) with i D_(aiaj)<0D′ _(aiaj) =D _(aiaj)/max(pr(b _(i)). pr(a _(j)), pr(a _(i)). pr(b_(j))) with D _(aiaj)>0

The skilled person will readily appreciate that other LD calculationmethods can be used.

Linkage disequilibrium among a set of biallelic markers having anadequate heterozygosity rate can be determined by genotyping between 50and 1000 unrelated individuals, preferably between 75 and 200, morepreferably around 100.

4) Testing for Association

Methods for determining the statistical significance of a correlationbetween a phenotype and a genotype, in this case an allele at abiallelic marker or a haplotype made up of such alleles, may bedetermined by any statistical test known in the art and with anyaccepted threshold of statistical significance being required. Theapplication of particular methods and thresholds of significance arewell with in the skill of the ordinary practitioner of the art.

Testing for association is performed by determining the frequency of abiallelic marker allele in case and control populations and comparingthese frequencies with a statistical test to determine if their is astatistically significant difference in frequency which would indicate acorrelation between the trait and the biallelic marker allele understudy. Similarly, a haplotype analysis is performed by estimating thefrequencies of all possible haplotypes for a given set of biallelicmarkers in case and control populations, and comparing these frequencieswith a statistical test to determine if their is a statisticallysignificant correlation between the haplotype and the phenotype (trait)under study. Any statistical tool useful to test for a statisticallysignificant association between a genotype and a phenotype may be used.Preferably the statistical test employed is a chi-square test with onedegree of freedom. A P-value is calculated (the P-value is theprobability that a statistic as large or larger than the observed onewould occur by chance).

Statistical Significance

In preferred embodiments, significance for diagnosis purposes, either asa positive basis for further diagnostic tests or as a preliminarystarting point for early preventive therapy, the p value related to abiallelic marker association is preferably about 1×10⁻² or less, morepreferably about 1×10⁴ or less, for a single biallelic marker analysisand about 1×10⁻³ or less, still more preferably 1×10⁻⁶ or less and mostpreferably of about 1×10⁻⁸ or less, for a haplotype analysis involvingtwo or more markers. These values are believed to be applicable to anyassociation studies involving single or multiple marker combinations.

The skilled person can use the range of values set forth above as astarting point in order to carry out association studies with biallelicmarkers of the present invention. In doing so, significant associationsbetween the biallelic markers of the present invention and obesity ordisorders related to obesity can be revealed and used for diagnosis anddrug screening purposes.

Phenotypic Permutation

In order to confirm the statistical significance of the first stagehaplotype analysis described above, it might be suitable to performfurther analyses in which genotyping data from case-control individualsare pooled and randomized with respect to the trait phenotype. Eachindividual genotyping data is randomly allocated to two groups, whichcontain the same number of individuals as the case-control populationsused to compile the data obtained in the first stage. A second stagehaplotype analysis is preferably run on these artificial groups,preferably for the markers included in the haplotype of the first stageanalysis showing the highest relative risk coefficient. This experimentis reiterated preferably at least between 100 and 10000 times. Therepeated iterations allow the determination of the percentage ofobtained haplotypes with a significant p-value level below about 1×10⁻³.

Assessment of Statistical Association

To address the problem of false positives similar analysis may beperformed with the same case-control populations in random genomicregions. Results in random regions and the candidate region are comparedas described in a co-pending US Provisional Patent Application entitled“Methods, Software And Apparati For Identifying Genomic RegionsHarboring A Gene Associated With A Detectable Trait,” U.S. Ser. No.60/107,986, filed Nov. 10, 1998, the contents of which are incorporatedherein by reference.

5) Evaluation of Risk Factors

The association between a risk factor (in genetic epidemiology the riskfactor is the presence or the absence of a certain allele or haplotypeat marker loci) and a disease is measured by the odds ratio (OR) and bythe relative risk (RR). If P(R⁺) is the probability of developing thedisease for individuals with R and P(R⁻) is the probability forindividuals without the risk factor, then the relative risk is simplythe ratio of the two probabilities, that is:RR=P(R ⁺)/P(R ⁻)

In case-control studies, direct measures of the relative risk cannot beobtained because of the sampling design. However, the odds ratio allowsa good approximation of the relative risk for low-incidence diseases andcan be calculated:OR=(F ⁺/(1−F ⁺))/(F ⁻/(1−F ⁻))

F⁺ is the frequency of the exposure to the risk factor in cases and F⁻is the frequency of the exposure to the risk factor in controls. F⁺ andF⁻ are calculated using the allelic or haplotype frequencies of thestudy and further depend on the underlying genetic model (dominant,recessive, additive, etc).

One can further estimate the attributable risk (AR) which describes theproportion of individuals in a population exhibiting a trait due to agiven risk factor. This measure is important in quantifying the role ofa specific factor in disease etiology and in terms of the public healthimpact of a risk factor. The public health relevance of this measurelies in estimating the proportion of cases of disease in the populationthat could be prevented if the exposure of interest were absent. AR isdetermined as follows:AR=P _(E)(RR−1)/(P _(E)(RR−1)+1)

AR is the risk attributable to a biallelic marker allele or a biallelicmarker haplotype. P_(E) is the frequency of exposure to an allele or ahaplotype within the population at large; and RR is the relative riskwhich, is approximated with the odds ratio when the trait under studyhas a relatively low incidence in the general population.

Identification of Biallelic Markers in Linkage Disequilibrium with theBiallelic Markers of the Invention

Once a first biallelic marker has been identified in a genomic region ofinterest, the practitioner of ordinary skill in the art, using theteachings of the present invention, can easily identify additionalbiallelic markers in linkage disequilibrium with this first marker. Asmentioned before any marker in linkage disequilibrium with a firstmarker associated with a trait will be associated with the trait.Therefore, once an association has been demonstrated between a givenbiallelic marker and a trait, the discovery of additional biallelicmarkers associated with this trait is of great interest in order toincrease the density of biallelic markers in this particular region. Thecausal gene or mutation will be found in the vicinity of the marker orset of markers showing the highest correlation with the trait.

Identification of additional markers in linkage disequilibrium with agiven marker involves: (a) amplifying a genomic fragment comprising afirst biallelic marker from a plurality of individuals; (b) identifyingof second biallelic markers in the genomic region harboring said firstbiallelic marker; (c) conducting a linkage disequilibrium analysisbetween said first biallelic marker and second biallelic markers; and(d) selecting said second biallelic markers as being in linkagedisequilibrium with said first marker. Subcombinations comprising steps(b) and (c) are also contemplated.

Methods to identify biallelic markers and to conduct linkagedisequilibrium analysis are described herein and can be carried out bythe skilled person without undue experimentation. The present inventionthen also concerns biallelic markers which are in linkage disequilibriumwith the specific biallelic markers A1, A2, A3, A4, A5, A6, A7, A8, A9,A10, A11, A12, A13, A14, A15, A16, A17, A18, A19, A20, A21, A22, A23,A24, A25 and A26 and which are expected to present similarcharacteristics in terms of their respective association with a giventrait; optionally, wherein said APM1-related biallelic marker isselected from the group consisting of A1, A2, and A7 or the groupconsisting of A4 and A8.

Mapping Studies: Identification of Functional Mutations

Once a positive association is confirmed with a biallelic marker of thepresent invention, gene in the associated candidate region (withinlinkage disequillibrium of the APM1 gene) can be scanned for mutationsby comparing the sequences of a selected number of trait positive andtrait negative individuals. In a preferred embodiment, functionalregions such as exons and splice sites, promoters and other regulatoryregions of the APM1 gene are scanned for mutations. Preferably, traitpositive individuals carry the haplotype shown to be associated with thetrait, and trait negative individuals do not carry the haplotype orallele associated with the trait. The mutation detection procedure isessentially similar to that used for biallelic site identification.

The method used to detect such mutations generally comprises thefollowing steps: (a) amplification of a region of the candidate genecomprising a biallelic marker or a group of biallelic markers associatedwith the trait from DNA samples of trait positive patients and traitnegative controls; (b) sequencing of the amplified region; (c)comparison of DNA sequences from trait-positive patients andtrait-negative controls; and (d) determination of mutations specific totrait-positive patients. Subcombinations which comprise steps (b) and(c) are specifically contemplated.

It is preferred that candidate polymorphisms be then verified byscreening a larger population of cases and controls by means of anygenotyping procedure such as those described herein, preferably using amicrosequencing technique in an individual test format. Polymorphismsare considered as candidate mutations when present in cases and controlsat frequencies compatible with the expected association results.

Biallelic Markers of the Invention in Methods of Genetic Diagnostics

The biallelic markers of the present invention can also be used todevelop diagnostic tests capable of identifying individuals who expressa detectable trait as the result of a specific genotype or individualswhose genotype places them at risk of developing a detectable trait at asubsequent time.

It will of course be understood by practitioners skilled in thetreatment or diagnosis of obesity and disorders related to obesity thatthe present invention does not intend to provide an absoluteidentification of individuals who could be at risk of developing aparticular disease involving obesity and disorders related to obesitybut rather to indicate a certain degree or likelihood of developing adisease. However, this information is extremely valuable as it can, incertain circumstances, be used to initiate preventive treatments or toallow an individual carrying a significant haplotype to foresee warningsigns such as minor symptoms. In diseases in which attacks may beextremely violent and sometimes fatal if not treated on time, theknowledge of a potential predisposition, even if this predisposition isnot absolute, might contribute in a very significant manner to treatmentefficacy.

The diagnostic techniques of the present invention may employ a varietyof methodologies to determine whether a test subject has a biallelicmarker pattern associated with an increased risk of developing adetectable trait or whether the individual suffers from a detectabletrait as a result of a particular mutation, including methods whichenable the analysis of individual chromosomes for haplotyping, such asfamily studies, single sperm DNA analysis or somatic hybrids. The traitanalyzed using the present diagnostics may be any detectable trait,including obesity and disorders related to obesity.

Another aspect of the present invention relates to a method ofdetermining whether an individual is at risk of developing a trait orwhether an individual expresses a trait as a consequence of possessing aparticular trait-causing allele. The present invention also relates to amethod of determining whether an individual is at risk of developing aplurality of traits or whether an individual expresses a plurality oftraits as a result of possessing a particular trait-causing allele.These methods involve obtaining a nucleic acid sample from theindividual and determining whether the nucleic acid sample contains oneor more alleles of one or more biallelic markers indicative of a risk ofdeveloping the trait or indicative that the individual expresses thetrait as a result of possessing a particular trait-causing allele. Thesemethods also involve obtaining a nucleic acid sample from the individualand, determining, whether the nucleic acid sample contains at least oneallele or at least one biallelic marker haplotype, indicative of a riskof developing the trait or indicative that the individual expresses thetrait as a result of possessing a particular APM1 polymorphism ormutation (trait-causing allele).

Preferably, in such diagnostic methods, a nucleic acid sample isobtained from the individual and this sample is genotyped using methodsdescribed above in “Methods Of Genotyping DNA Samples For Biallelicmarkers. The diagnostics may be based on a single biallelic marker or ona group of biallelic markers. In each of these methods, a nucleic acidsample is obtained from the test subject and the biallelic markerpattern of one or more of the biallelic markers A1 to A26 is determined.Alternatively, the one or more biallelic markers are selected from thegroup consisting of A1, A2, A4, A7, and A8. Alternatively, one or morebiallelic markers are selected from the group consisting of A1, A2, andA7.

In one embodiment, a PCR amplification is conducted on the nucleic acidsample to amplify regions in which polymorphisms associated with adetectable phenotype have been identified. The amplification productsare sequenced to determine whether the individual possesses one or moreAPM1 polymorphisms associated with a detectable phenotype. The primersused to generate amplification products may comprise the primers listedin Table 1. Alternatively, the nucleic acid sample is subjected tomicrosequencing reactions as described above to determine whether theindividual possesses one or more APM1 polymorphisms associated with adetectable phenotype resulting from a mutation or a polymorphism in theAPM1 gene. The primers used in the microsequencing reactions may includethe primers listed in Table 4.

In another embodiment, the nucleic acid sample is contacted with one ormore allele specific oligonucleotide probes which specifically hybridizeto one or more APM1 alleles associated with a detectable phenotype. Theprobes used in the hybridization assay may include the probes listed inTable 3. In another embodiment, the nucleic acid sample is contactedwith a second APM1 oligonucleotide capable of producing an amplificationproduct when used with the allele specific oligonucleotide in anamplification reaction. The presence of an amplification product in theamplification reaction indicates that the individual possesses one ormore APM1 alleles associated with a detectable phenotype.

As described herein, the diagnostics may be based on a single biallelicmarker or a group of biallelic markers. Preferably, the biallelic markeror combination of biallelic makers is selected from the group consistingof A1 to A26 and the complements thereof or any combination or subsetthereof. More preferably, the one or more biallelic markers are selectedfrom the group consisting of A1, A2, A4, A7, and A8, and the complementsthereof or any combination or subset thereof. Alternatively, the one ormore biallelic markers are selected from the group consisting of A1, A2,and A7. Diagnostic kits comprise any of the polynucleotides of thepresent invention.

These diagnostic methods are extremely valuable as they can, in certaincircumstances, be used to initiate preventive treatments or to allow anindividual carrying a significant genotype or haplotype to foreseewarning signs such as minor symptoms. For example, in the studydescribed in Example 6, the subjects were all adolescent girls who didnot yet have significant disease. However, by identifying the girls asadolescents who are at risk for obesity and obesity-related diseases anddisorders later in their life, they could be targeted now for moreintensive treatment to prevent the onset of later severe disease, suchas diabetes, or cardiovascular complications, or any of the otherobesity-related diseases discussed herein. An association has been shownbetween APM1 markers and indicators of obesity and diabetes,specifically, as well as indicating susceptibility to other relateddiseases (Example 6).

Diagnostics, which analyze and predict response to a drug or sideeffects to a drug, may be used to determine whether an individual shouldbe treated with a particular drug. For example, if the diagnosticindicates a likelihood that an individual will respond positively totreatment with a particular drug, the drug may be administered to theindividual. Conversely, if the diagnostic indicates that an individualis likely to respond negatively to treatment with a particular drug, analternative course of treatment may be prescribed. A negative responsemay be defined as either the absence of an efficacious response or thepresence of toxic side effects. For example, in the study described inExample 6, the identified APM1 markers would be useful for genotyping apopulation of obese people to determine which people are more likely tobe susceptibile to drugs designed to lower leptin levels or free fattyacid levels. Other associations between APM1 markers and other traitsassociated with obesity can also be determined using the methods of theinvention without undue experimentation and would indicate other markersuseful to identify sub-populations of people likely to be susceptible(or not) to a drug targeting those traits. In addition, specificassociations can be performed looking at drug outcome (treatment/sideeffect) to identify other useful markers for predicting risks/successfultreatment.

Clinical drug trials represent another application for the markers ofthe present invention. One or more markers indicative of response to anagent acting against an obesity-related disease or to side effects to anagent acting against an obesity-related disease may be identified usingthe methods described above. Thereafter, potential participants inclinical trials of such an agent may be screened to identify thoseindividuals most likely to respond favorably to the drug and/or excludethose likely to experience side effects. In that way, the effectivenessof drug treatment may be measured in individuals who have the potentialto respond positively to the drug, without lowering the measurement as aresult of the inclusion of individuals who are unlikely to respondpositively in the study and/or without risking undesirable safetyproblems.

Based on Example 6, herein, subgroups for clinical trials could beidentified that had the rare allele of biallelic markers A1 and/or A2,any or all of the biallelic markers A4, A7 and A8, or both sets ofbiallelic markers. The first set of markers was shown to be associatedwith increased leptin levels, and the second set was associated withhigher free fatty acid levels in obese girls. Having the rare allelefrom either of the sets of markers indicated a higher risk of obesity inlater life compared with a group of individuals who remained thinthroughout life with the same ethnicity background (data not shown).Thus these markers can be used to predict patients who might besusceptible to drugs designed to target/ameliorate these symptoms.

Obviously, the methods of the invention can be used to identify othermarkers and find other associations with traits associated with obesitysuch as hypertriglyceridemia, or hypertension, for example.

Recombinant Vectors

The term “vector” is used herein to designate either a circular or alinear DNA or RNA molecule, which is either double-stranded orsingle-stranded, and which comprise at least one polynucleotide ofinterest that is sought to be transferred in a cell host or in aunicellular or multicellular host organism.

The present invention encompasses a family of recombinant vectors thatcomprise a regulatory polynucleotide derived from the APM1 genomicsequence, or a coding polynucleotide from the APM1 genomic sequence.Consequently, the present invention further deals with a recombinantvector comprising either a regulatory polynucleotide comprised in thenucleic acids of SEQ ID Nos 2 and 3 or a polynucleotide comprising theAPM1 coding sequence or both.

In a first preferred embodiment, a recombinant vector of the inventionis used to amplify the inserted polynucleotide derived from a APM1genomic sequence selected from the group consisting of the nucleic acidsof SEQ ID No 1, 2 and 3 or a APM1 cDNA, for example the cDNA of SEQ IDNO 5 in a suitable cell host, this polynucleotide being amplified atevery time that the recombinant vector replicates.

Generally, a recombinant vector of the invention may comprise any of thepolynucleotides described herein, including regulatory sequences andcoding sequences, as well as any APM1 primer or probe as defined above.

A second preferred embodiment of the recombinant vectors according tothe invention consists of expression vectors comprising either aregulatory polynucleotide or a coding nucleic acid of the invention, orboth. Within certain embodiments, expression vectors are employed toexpress the APM1 polypeptide which can be then purified and, for examplebe used in ligand screening assays or as an immunogen in order to raisespecific antibodies directed against the APM1 protein. In otherembodiments, the expression vectors are used for constructing transgenicanimals and also for gene therapy. Expression requires that appropriatesignals are provided in the vectors, said signals including variousregulatory elements, such as enhancers/promoters from both viral andmammalian sources that drive expression of the genes of interest in hostcells. Dominant drug selection markers for establishing permanent,stable cell clones expressing the products are generally included in theexpression vectors of the invention, as they are elements that linkexpression of the drug selection markers to expression of thepolypeptide.

More particularly, the present invention relates to expression vectorswhich include nucleic acids encoding a APM1 protein, preferably the APM1protein of the amino acid sequence of SEQ ID No 6 or variants orfragments thereof, under the control of a regulatory sequence selectedamong the APM1 regulatory polynucleotides, or alternatively under thecontrol of an exogenous regulatory sequence.

Consequently, preferred expression vectors of the invention are selectedfrom the group consisting of: (a) the APM1 regulatory sequence comprisedtherein drives the expression of a coding polynucleotide operably linkedthereto; (b) the APM1 coding sequence is operably linked to regulationsequences allowing its expression in a suitable cell host and/or hostorganism.

A recombinant expression vector comprising a nucleic acid selected fromthe group consisting of SEQ ID No 2, or biologically active fragments orvariants thereof, is also part of the present invention.

In a preferred embodiment, a recombinant expression vector of theinvention comprises a regulatory nucleotide sequence selected from thegroup consisting of:

-   -   (i) a nucleotide sequence comprising a polynucleotide of SEQ ID        NO 2 or a complementary sequence thereto;    -   (ii) a nucleotide sequence comprising a polynucleotide having at        least 95% of nucleotide identity with the nucleotide sequence of        SEQ ID No 2 or a complementary sequence thereto;    -   (iii) a nucleotide sequence comprising a polynucleotide that        hybridizes under stringent hybridization conditions with the        nucleotide sequence of SEQ ID No 2 or a complementary sequence        thereto; and    -   (iv) a biologically active fragment or variant of the        polynucleotides in (i), (ii) and (iii).

The invention also encompasses a recombinant expression vectorcomprising:

-   -   a) a nucleic acid comprising a regulatory nucleotide sequence        selected from the group consisting of:        -   (i) a nucleotide sequence comprising a polynucleotide of SEQ            ID NO 2 or a complementary sequence thereto;        -   (ii) a nucleotide sequence comprising a polynucleotide            having at least 95% of nucleotide identity with the            nucleotide sequence of SEQ ID No 2 or a complementary            sequence thereto;        -   (iii) a nucleotide sequence comprising a polynucleotide that            hybridizes under stringent hybridization conditions with the            nucleotide sequence of SEQ ID No 2 or a complementary            sequence thereto; and        -   (iv) a biologically active fragment or variant of the            polynucleotides in (i), (ii) and (iii); and    -   b) a polynucleotide encoding a desired polypeptide or nucleic        acid of interest, operably linked to the nucleic acid defined        in (a) above.

Additionally, the recombinant expression vector described above may alsocomprise a nucleic acid comprising a 3′-regulatory polynucleotide,preferably a 3′-regulatory polynucleotide of the APM1 gene. The APM13′-regulatory polynucleotide may also comprise the 3′-UTR sequencecontained in the nucleotide sequence of SEQ ID NO 5.

The 5′-regulatory polynucleotide may also include the 5′-UTR sequence ofthe APM1 cDNA, or a biologically active fragment or variant thereof.

The invention also pertains to a recombinant expression vector usefulfor the expression of the APM1 coding sequence, wherein said vectorcomprises a nucleic acid of SEQ ID No 5.

Another preferred recombinant expression vector consists of a vector forexpressing a APM1 coding sequence, wherein said vector comprises anucleic acid of SEQ ID No 1 or a fragment thereof or a nucleic acidhaving at least 95% nucleotide identity with a polynucleotide of SEQ IDNo 1 or a fragment thereof.

Recombinant vectors comprising a nucleic acid containing a APM1-relatedbiallelic marker is also part of the invention. In a preferredembodiment, said biallelic marker is selected from the group consistingof A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13, A14, A15,A16, A17, A18, A19, A20, A21, A22, A23, A24, A25 and A26, and thecomplements thereof, optionally, wherein said APM1-related biallelicmarker is selected from the group consisting of A1, A2, and A7 or thegroup consisting of A4 and A8.

Some of the elements which can be found in the vectors of the presentinvention are described in further detail in the following sections.

1. General Features of the Expression Vectors of the Invention

A recombinant vector according to the invention comprises, but is notlimited to, a YAC (Yeast Artificial Chromosome), a BAC (BacterialArtificial Chromosome), a phage, a phagemid, a cosmid, a plasmid or evena linear DNA molecule which may consist of a chromosomal,non-chromosomal, semi-synthetic and synthetic DNA. Such a recombinantvector can comprise a transcriptional unit comprising an assembly of:

(1) a genetic element or elements having a regulatory role in geneexpression, for example promoters or enhancers. Enhancers are cis-actingelements of DNA, usually from about 10 to 300 bp in length that act onthe promoter to increase the transcription.

(2) a structural or coding sequence which is transcribed into mRNA andeventually translated into a polypeptide, said structural or codingsequence being operably linked to the regulatory elements described in(1); and

(3) appropriate transcription initiation and termination sequences.Structural units intended for use in yeast or eukaryotic expressionsystems preferably include a leader sequence enabling extracellularsecretion of translated protein by a host cell. Alternatively, when arecombinant protein is expressed without a leader or transport sequence,it may include a N-terminal residue. This residue may or may not besubsequently cleaved from the expressed recombinant protein to provide afinal product.

Generally, recombinant expression vectors will include origins ofreplication, selectable markers permitting transformation of the hostcell, and a promoter derived from a highly expressed gene to directtranscription of a downstream structural sequence. The heterologousstructural sequence is assembled in appropriate phase with translationinitiation and termination sequences, and preferably a leader sequencecapable of directing secretion of the translated protein into theperiplasmic space or the extracellular medium. In a specific embodimentwherein the vector is adapted for transfecting and expressing desiredsequences in mammalian host cells, preferred vectors will comprise anorigin of replication in the desired host, a suitable promoter andenhancer, and also any necessary ribosome binding sites, polyadenylationsite, splice donor and acceptor sites, transcriptional terminationsequences, and 5′-flanking non-transcribed sequences. DNA sequencesderived from the SV40 viral genome, for example SV40 origin, earlypromoter, enhancer, splice and polyadenylation sites may be used toprovide the required non-transcribed genetic elements.

The in vivo expression of a APM1 polypeptide of SEQ ID No 6 or fragmentsor variants thereof may be useful in order to correct a genetic defectrelated to the expression of the native gene in a host organism or tothe production of a biologically inactive APM1 protein.

Consequently, the present invention also deals with recombinantexpression vectors mainly designed for the in vivo production of theAPM1 polypeptide of SEQ ID No 6 or fragments or variants thereof by theintroduction of the appropriate genetic material in the organism of thepatient to be treated. This genetic material may be introduced in vitroin a cell that has been previously extracted from the organism, themodified cell being subsequently reintroduced in the said organism,directly in vivo into the appropriate tissue.

2. Regulatory Elements

Promoters

The suitable promoter regions used in the expression vectors accordingto the present invention are chosen taking into account the cell host inwhich the heterologous gene has to be expressed. The particular promoteremployed to control the expression of a nucleic acid sequence ofinterest is not believed to be important, so long as it is capable ofdirecting the expression of the nucleic acid in the targeted cell. Thus,where a human cell is targeted, it is preferable to position the nucleicacid coding region adjacent to and under the control of a promoter thatis capable of being expressed in a human cell, such as, for example, ahuman or a viral promoter.

A suitable promoter may be heterologous with respect to the nucleic acidfor which it controls the expression or alternatively can be endogenousto the native polynucleotide containing the coding sequence to beexpressed. Additionally, the promoter is generally heterologous withrespect to the recombinant vector sequences within which the constructpromoter/coding sequence has been inserted.

Promoter regions can be selected from any desired gene using, forexample, CAT (chloramphenicol transferase) vectors and more preferablypKK232-8 and pCM7 vectors.

Preferred bacterial promoters are the LacI, LacZ, the T3 or T7bacteriophage RNA polymerase promoters, the gpt, lambda PR, PL and trppromoters (EP 0036776), the polyhedrin promoter, or the p10 proteinpromoter from baculovirus (Kit Novagen) (Smith et al., 1983; O'Reilly etal., 1992), the lambda PR promoter or also the trc promoter.

Eukaryotic promoters include CMV immediate early, HSV thymidine kinase,early and late SV40, LTRs from retrovirus, and mouse metallothionein-L.Selection of a convenient vector and promoter is well within the levelof ordinary skill in the art.

The choice of a promoter is well within the ability of a person skilledin the field of genetic engineering (Sambrook et al. (l989) And Fulleret al. (1996)).

Other Regulatory Elements

Where a cDNA insert is employed, one will typically desire to include apolyadenylation signal to effect proper polyadenylation of the genetranscript. The nature of the polyadenylation signal is not believed tobe crucial to the successful practice of the invention, and any suchsequence may be employed such as human growth hormone and SV40polyadenylation signals. Also contemplated as an element of theexpression cassette is a terminator. These elements can serve to enhancemessage levels and to minimize read through from the cassette into othersequences.

The vector containing the appropriate DNA sequence as described above,more preferably APM1 gene regulatory polynucleotide, a polynucleotideencoding the APM1 polypeptide selected from the group consisting of SEQID No 1 or a fragment or a variant thereof and SEQ ID No 5, or both ofthem, can be utilized to transform an appropriate host to allow theexpression of the desired polypeptide or polynucleotide.

3. Selectable Markers

Selectable markers confer an identifiable change to the cell permittingeasy identification of cells containing the expression construct. Theselectable marker genes for selection of transformed host cells arepreferably dihydrofolate reductase or neomycin resistance for eukaryoticcell culture, TRP1 for S. cerevisiae or tetracycline, rifampicin orampicillin resistance in E. coli, or levan saccharase for mycobacteria,this latter marker being a negative selection marker.

3. Preferred Vectors.

Bacterial Vectors

As a representative but non-limiting example, useful expression vectorsfor bacterial use can comprise a selectable marker and a bacterialorigin of replication derived from commercially available plasmidscomprising genetic elements of pBR322 (ATCC 37017). Such commercialvectors include, for example, pKK223-3 (Pharmacia, Uppsala, Sweden), andGEM1 (Promega Biotec, Madison, Wis., USA).

Large numbers of other suitable vectors are known to those of skill inthe art, and commercially available, such as the following bacterialvectors : pQE70, pQE60, pQE-9 (Qiagen), pbs, pD10, phagescript, psiX174, pbluescript SK, pbsks, pNH8A, pNH16A, pNH18A, pNH46A (Stratagene);ptrc99a, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia); pWLNEO, pSV2CAT,pOG44, pXT1, pSG (Stratagene); pSVK3, pBPV, pMSG, pSVL (Pharmacia);pQE-30 (QIAexpress).

Bacteriophage Vectors

The P1 bacteriophage vector may contain large inserts ranging from about80 to about 100 kb.

The construction of P1 bacteriophage vectors such as p158 or p158/neo8are notably described by Sternberg (1992, 1994). Recombinant P1 clonescomprising APM1 nucleotide sequences may be designed for inserting largepolynucleotides of more than 40 kb (Linton et al., 1993). To generate P1DNA for transgenic experiments, a preferred protocol is the protocoldescribed by McCormick et al. (1994). Briefly, E. coli (preferablystrain NS3529) harboring the P1 plasmid are grown overnight in asuitable broth medium containing 25 μg/ml of kanamycin. The P1 DNA isprepared from the E. coli by alkaline lysis using the Qiagen PlasmidMaxi kit (Qiagen, Chatsworth, Calif., USA), according to themanufacturer's instructions. The P1 DNA is purified from the bacteriallysate on two Qiagen-tip 500 columns, using the washing and elutionbuffers contained in the kit. A phenol/chloroform extraction is thenperformed before precipitating the DNA with 70% ethanol. Aftersolubilizing the DNA in TE (10 mM Tris-HCl, pH 7. 4, 1 mM EDTA), theconcentration of the DNA is assessed by spectrophotometry.

To express a P1 clone comprising APM1 nucleotide sequences in atransgenic animal, typically transgenic mice, it is desirable to removevector sequences from the P1 DNA fragment, for example by cleaving theP1 DNA at rare-cutting sites within the P1 polylinker (SfiI, NotI orSalI). The P1 insert is then purified from vector sequences on apulsed-field agarose gel, using methods similar using methods similar tothose originally reported for the isolation of DNA from YACs (Schedl etal., 1993a; Peterson et al., 1993). At this stage, the resultingpurified insert DNA can be concentrated, if necessary, on a MilliporeUltrafree-MC Filter Unit (Millipore, Bedford, Mass., USA—30,000molecular weight limit) and then dialyzed against microinjection buffer(10 mM Tris-HCl, pH 7. 4; 250 μM EDTA) containing 100 mM NaCl, 30 μMspermine, 70 μM spermidine on a microdyalisis membrane (type VS, 0. 025μM from Millipore). The intactness of the purified P1 DNA insert isassessed by electrophoresis on 1% agarose (Sea Kem GTG; FMCBio-products) pulse-field gel and staining with ethidium bromide.

Baculovirus Vectors

A suitable vector for the expression of the APM1 polypeptide of SEQ IDNo 6 or fragments or variants thereof is a baculovirus vector that canbe propagated in insect cells and in insect cell lines. A specificsuitable host vector system is the pVL1392/1393 baculovirus transfervector (Pharmingen) that is used to transfect the SF9 cell line (ATCCN^(o)CRL 1711) which is derived from Spodoptera frugiperda.

Other suitable vectors for the expression of the APM1 polypeptide of SEQID No 6 or fragments or variants thereof in a baculovirus expressionsystem include those described by Chai et al. (1993), Vlasak et al.(1983) and Lenhard et al. (1996).

Viral Vectors

In one specific embodiment, the vector is derived from an adenovirus.Preferred adenovirus vectors according to the invention are thosedescribed by Feldman and Steg (1996) or Ohno et al. (1994). Anotherpreferred recombinant adenovirus according to this specific embodimentof the present invention is the human adenovirus type 2 or 5 (Ad 2 or Ad5) or an adenovirus of animal origin (French patent application N^(o)FR-93. 05954).

Retrovirus vectors and adeno-associated virus vectors are generallyunderstood to be the recombinant gene delivery systems of choice for thetransfer of exogenous polynucleotides in vivo, particularly to mammals,including humans. These vectors provide efficient delivery of genes intocells, and the transferred nucleic acids are stably integrated into thechromosomal DNA of the host.

Particularly preferred retroviruses for the preparation or constructionof retroviral in vitro or in vitro gene delivery vehicles of the presentinvention include retroviruses selected from the group consisting ofMink-Cell Focus Inducing Virus, Murine Sarcoma Virus,Reticuloendotheliosis virus and Rous Sarcoma virus. Particularlypreferred Murine Leukemia Viruses include the 4070A and the 1504Aviruses, Abelson (ATCC No VR-999), Friend (ATCC No VR-245), Gross (ATCCNo VR-590), Rauscher (ATCC No VR-998) and Moloney Murine Leukemia Virus(ATCC No VR-190; PCT Application No WO 94/24298). Particularly preferredRous Sarcoma Viruses include Bryan high titer (ATCC Nos VR-334, VR-657,VR-726, VR-659 and VR-728). Other preferred retroviral vectors are thosedescribed in Roth et al. (1996), PCT Application No WO 93/25234, PCTApplication No WO 94/ 06920, Roux et al., 1989, Julan et al., 1992 andNeda et al., 1991.

Yet another viral vector system that is contemplated by the inventionconsists in the adeno-associated virus (AAV). The adeno-associated virusis a naturally occurring defective virus that requires another virus,such as an adenovirus or a herpes virus, as a helper virus for efficientreplication and a productive life cycle (Muzyczka et al., 1992). It isalso one of the few viruses that may integrate its DNA into non-dividingcells, and exhibits a high frequency of stable integration (Flotte etal., 1992; Samulski et al., 1989; McLaughlin et al., 1989). Oneadvantageous feature of AAV derives from its reduced efficacy fortransducing primary cells relative to transformed cells.

BAC Vectors

The bacterial artificial chromosome (BAC) cloning system (Shizuya etal., 1992) has been developed to stably maintain large fragments ofgenomic DNA (100-300 kb) in E. coli. A preferred BAC vector consists ofpBeloBAC11 vector that has been described by Kim et al. (1996). BAClibraries are prepared with this vector using size-selected genomic DNAthat has been partially digested using enzymes that permit ligation intoeither the Bam HI or HindIII sites in the vector. Flanking these cloningsites are T7 and SP6 RNA polymerase transcription initiation sites thatcan be used to generate end probes by either RNA transcription or PCRmethods. After the construction of a BAC library in E. coli, BAC DNA ispurified from the host cell as a supercoiled circle. Converting thesecircular molecules into a linear form precedes both size determinationand introduction of the BACs into recipient cells. The cloning site isflanked by two Not I sites, permitting cloned segments to be excisedfrom the vector by Not I digestion. Alternatively, the DNA insertcontained in the pBeloBAC11 vector may be linearized by treatment of theBAC vector with the commercially available enzyme lambda terminase thatleads to the cleavage at the unique cosN site, but this cleavage methodresults in a full length BAC clone containing both the insert DNA andthe BAC sequences.

5. Delivery of the Recombinant Vectors

In order to effect expression of the polynucleotides and polynucleotideconstructs of the invention, these constructs must be delivered into acell. This delivery may be accomplished in vitro, as in laboratoryprocedures for transforming cell lines, or in vivo or ex vivo, as in thetreatment of certain diseases states.

One mechanism is viral infection where the expression construct isencapsulated in an infectious viral particle.

Several non-viral methods for the transfer of polynucleotides intocultured mammalian cells are also contemplated by the present invention,and include, without being limited to, calcium phosphate precipitation(Graham et al., 1973; Chen et al., 1987;), DEAE-dextran (Gopal, 1985),electroporation (Tur-Kaspa et al., 1986; Potter et al., 1984), directmicroinjection (Harland et al., 1985), DNA-loaded liposomes (Nicolau etal., 1982; Fraley et al., 1979), and receptor-mediate transfection (Wuand Wu, 1987; 1988). Some of these techniques may be successfullyadapted for in vivo or ex vivo use.

Once the expression polynucleotide has been delivered into the cell, itmay be stably integrated into the genome of the recipient cell. Thisintegration may be in the cognate location and orientation viahomologous recombination (gene replacement) or it may be integrated in arandom, non specific location (gene augmentation). In yet furtherembodiments, the nucleic acid may be stably maintained in the cell as aseparate, episomal segment of DNA. Such nucleic acid segments or“episomes” encode sequences sufficient to permit maintenance andreplication independent of or in synchronization with the host cellcycle.

One specific embodiment for a method for delivering a protein or peptideto the interior of a cell of a vertebrate in vivo comprises the step ofintroducing a preparation comprising a physiologically acceptablecarrier and a naked polynucleotide operatively coding for thepolypeptide of interest into the interstitial space of a tissuecomprising the cell, whereby the naked polynucleotide is taken up intothe interior of the cell and has a physiological effect. This isparticularly applicable for transfer in vitro but it may be applied toin vivo as well.

Compositions for use in vitro and in vivo comprising a “naked”polynucleotide are described in PCT application N^(o) WO 90/11092 (VicalInc.) and also in PCT application No. WO 95/11307 (Institut Pasteur,INSERM, Université d'Ottawa) as well as in the articles of Tacson et al.(1996) and of Huygen et al. (1996).

In still another embodiment of the invention, the transfer of a nakedpolynucleotide of the invention, including a polynucleotide construct ofthe invention, into cells may be proceeded with a particle bombardment(biolistic), said particles being DNA-coated microprojectilesaccelerated to a high velocity allowing them to pierce cell membranesand enter cells without killing them, such as described by Klein et al.(1987).

In a further embodiment, the polynucleotide of the invention may beentrapped in a liposome (Ghosh and Bacchawat, 1991; Wong et al., 1980;Nicolau et al., 1987)

In a specific embodiment, the invention provides a composition for thein vivo production of the APM1 protein or polypeptide described herein.It comprises a naked polynucleotide operatively coding for thispolypeptide, in solution in a physiologically acceptable carrier, andsuitable for introduction into a tissue to cause cells of the tissue toexpress the said protein or polypeptide.

The amount of vector to be injected to the desired host organism variesaccording to the site of injection. As an indicative dose, it will beinjected between 0,1 and 100 μg of the vector in an animal body,preferably a mammal body, for example a mouse body.

In another embodiment of the vector according to the invention, it maybe introduced in vitro in a host cell, preferably in a host cellpreviously harvested from the animal to be treated and more preferably asomatic cell such as a muscle cell. In a subsequent step, the cell thathas been transformed with the vector coding for the desired APM1polypeptide or the desired fragment thereof is reintroduced into theanimal body in order to deliver the recombinant protein within the bodyeither locally or systemically.

Cell Hosts

Another object of the invention consists of a host cell that have beentransformed or transfected with one of the polynucleotides describedtherein, and more precisely a polynucleotide either comprising a APM1regulatory polynucleotide or the coding sequence of the APM1 polypeptideselected from the group consisting of SEQ ID No 1 or a fragment or avariant thereof and SEQ ID No 5. Are included host cells that aretransformed (prokaryotic cells) or that are transfected (eukaryoticcells) with a recombinant vector such as one of those described above.

Generally, a recombinant host cell of the invention comprises any one ofthe polynucleotides or the recombinant vectors described therein.

A preferred recombinant host cell according to the invention comprises apolynucleotide selected from the following group of polynucleotides:

a) a purified or isolated nucleic acid encoding a APM1 polypeptide, or apolypeptide fragment or variant thereof;

b) a purified or isolated nucleic comprising at least 8, preferably atleast 15, more preferably at least 25, consecutive nucleotides of anucleotide sequence selected from the group consisting of:

-   -   1) the nucleotide sequence beginning at the nucleotide in        position 1 and ending at the nucleotide in position 4811 of the        nucleotide sequence of SEQ ID No 1 or a variant thereof or a        sequence complementary thereto; more particularly, the        nucleotide sequence beginning at the nucleotide in position 1        and ending at the nucleotide in position 3529 of the nucleotide        sequence of SEQ ID No 1 or a variant thereof or a sequence        complementary thereto;    -   2) the nucleotide sequence beginning at the nucleotide in        position 4852 and ending at the nucleotide in position 15142 of        the nucleotide sequence of SEQ ID No 1 or a variant thereof or a        sequence complementary thereto;    -   3) the nucleotide sequence beginning at the nucleotide in        position 15366 and ending at the nucleotide in position 16276 of        the nucleotide sequence of SEQ ID No 1 or a variant thereof or a        sequence complementary thereto; and    -   4) the nucleotide sequence beginning at the nucleotide in        position 20560 and ending at the nucleotide in position 20966 of        the nucleotide sequence of SEQ ID No 1 or a variant thereof or a        sequence complementary thereto;

c) a purified or isolated nucleic acid comprising at least 8 consecutivenucleotides, preferably at least 15 of the nucleotide sequence beginningat the nucleotide in position 1 and ending at the nucleotide in position22 of the nucleotide sequence of SEQ ID No 5 or a variant thereof or asequence complementary thereto;

d) a purified or isolated nucleic acid comprising an exon of the APM1gene, a sequence complementary thereto or a variant thereof;

e) a purified or isolated nucleic acid comprising a combination of atleast two exons of the APM1 gene, or the sequences complementary theretowherein the polynucleotides are arranged within the nucleic acid, fromthe 5′ end to the 3′end of said nucleic acid, in the same order than inSEQ ID No 1;

f) a purified or isolated nucleic acid comprising the nucleotidesequence SEQ ID No 2 or the sequences complementary thereto or abiologically active fragment or a variant thereof;

g) a purified or isolated nucleic acid comprising the nucleotidesequence SEQ ID No 3, or the sequence complementary thereto or abiologically active fragment or a variant thereof;

h) a polynucleotide consisting of:

-   -   (1) a nucleic acid comprising a regulatory polynucleotide of SEQ        ID No 2 or the sequences complementary thereto or a biologically        active fragment or variant thereof;    -   (2) a polynucleotide encoding a desired polypeptide or nucleic        acid; or    -   (3) optionally, a nucleic acid comprising a regulatory        polynucleotide of SEQ ID No 3, or the sequence complementary        thereto or a biologically active fragment or variant thereof;        and

i) a DNA construct as described previously in the present specification.

Another preferred recombinant cell host according to the presentinvention is characterized in that its genome or genetic background(including chromosome, plasmids) is modified by the nucleic acid codingfor the APM I polypeptide of SEQ ID No 5 or fragments or variantsthereof.

A further recombinant cell host according to the invention comprises apolynucleotide containing a biallelic marker selected from the groupconsisting of A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13,A14, A15, A16, A17, A18, A19, A20, A21, A22, A23, A24, A25 and A26, andthe complements thereof; optionally, wherein said APM1-related biallelicmarker is selected from the group consisting of A1, A2, and A7 or thegroup consisting of A4 and A8.

Preferred host cells used as recipients for the expression vectors ofthe invention are the following:

-   -   a) Prokaryotic host cells: Escherichia coli strains (I.E.DH5-α        strain), Bacillus subtilis, Salmonella typhimurium, and strains        from species like Pseudomonas, Streptomyces and Staphylococcus;    -   b) Eukaryotic host cells: HeLa cells (ATCC N^(o)CCL2; N^(o)CCL2.        1; N^(o)CCL2. 2), Cv 1 cells (ATCC N^(o)CCL70), COS cells (ATCC        N^(o)CRL1650; N^(o)CRL1651), Sf-9 cells (ATCC N^(o)CRL171 1),        C127 cells (ATCC N^(o)CRL-1804), 3T3 (ATCC N^(o)CRL-6361), CHO        (ATCC N^(o)CCL-61), human kidney 293. (ATCC N^(o) 45504; N^(o)        CRL-1573) and BHK (ECACC N^(o) 84100501; N^(o) 84111301); and    -   c) other mammalian host cells.

The APM1 gene expression in mammalian, and typically human, cells may berendered defective, or alternatively it may be proceeded with theinsertion of a APM1 genomic or cDNA sequence with the replacement of theAPM1 gene counterpart in the genome of an animal cell by a APM1polynucleotide according to the invention. These genetic alterations maybe generated by homologous recombination events using specific DNAconstructs that have been previously described.

Host cells that may be used include mammalian zygotes, such as murinezygotes. For example, murine zygotes may undergo microinjection with apurified DNA molecule of interest, for example a purified DNA moleculethat has previously been adjusted to a concentration range from 1ng/mL—for BAC inserts-3 ng/μL—for P1 bacteriophage inserts—in 10 mMTris-HCl, pH 7. 4, 250 μM EDTA containing 100 mM NaCl, 30 μM spermine,and 70 μM spermidine. When the DNA to be microinjected has a large size,polyamines and high salt concentrations can be used in order to avoidmechanical breakage of this DNA, as described by Schedl et al (1993b).

Any of the polynucleotides of the invention, including the DNAconstructs described herein, may be introduced in an embryonic stem (ES)cell line, preferably a mouse ES cell line. ES cell lines are derivedfrom pluripotent, uncommitted cells of the inner cell mass ofpre-implantation blastocysts. Preferred ES cell lines include thefollowing: ES-E14TG2a (ATCC n^(o) CRL-1821), ES-D3 (ATCC n^(o) CRL1934and n° CRL-11632), YS001 (ATCC n^(o) CRL-11776), 36. 5 (ATCC n^(o)CRL-11116). To maintain ES cells in an uncommitted state, they arecultured in the presence of growth inhibited feeder cells that providethe appropriate signals to preserve this embryonic phenotype and serveas a matrix for ES cell adherence. Preferred feeder cells consist ofprimary embryonic fibroblasts that are established from tissue of day13-day 14 embryos of virtually any mouse strain, that are maintained inculture, such as described by Abbondanzo et al. (1993) and are inhibitedin growth by irradiation, such as described by Robertson (1987), or bythe presence of an inhibitory concentration of LIF, such as described byPease and Williams (1990).

The constructs in the host cells can be used in a conventional manner toproduce the gene product encoded by the recombinant sequence.

Following transformation of a suitable host and growth of the host to anappropriate cell density, the selected promoter is induced byappropriate means, such as temperature shift or chemical induction, andcells are cultivated for an additional period.

Cells are typically harvested by centrifugation, disrupted by physicalor chemical means, and the resulting crude extract retained for furtherpurification.

Microbial cells employed in the expression of proteins can be disruptedby any convenient method, including freeze-thaw cycling, sonication,mechanical disruption, or use of cell lysing agents. Such methods arewell known by the skill artisan.

Transgenic Animals

The terms “transgenic animals” or “host animals” are used hereindesignate animals that have their genome genetically and artificiallymanipulated so as to include one of the nucleic acids according to theinvention. Preferred animals are non-human mammals and include thosebelonging to a genus selected from Mus (e.g. mice), Rattus (e.g. rats)and Oryctogalus (e.g. rabbits) which have their genome artificially andgenetically altered by the insertion of a nucleic acid according to theinvention. In one embodiment, the invention encompasses non-human hostmammals and animals comprising a recombinant vector of the invention oran APM1 gene disrupted by homologous recombination with a knock outvector.

The transgenic animals of the invention all include within a pluralityof their cells a cloned recombinant or synthetic DNA sequence, morespecifically one of the purified or isolated nucleic acids comprising aAPM1 coding sequence, a APM1 regulatory polynucleotide or a DNA sequenceencoding an antisense polynucleotide such as described in the presentspecification.

Preferred transgenic animals according to the invention contain in theirsomatic cells and/or in their germ line cells a polynucleotide selectedfrom the following group of polynucleotides:

a) a purified or isolated nucleic acid encoding a APM1 polypeptide, or apolypeptide fragment or variant thereof;

b) a purified or isolated nucleic comprising at least 8, preferably atleast 15, more preferably at least 25, consecutive nucleotides of anucleotide sequence selected from the group consisting of:

-   -   1) the nucleotide sequence beginning at the nucleotide in        position 1 and ending at the nucleotide in position 4811 of the        nucleotide sequence of SEQ ID No 1 or a variant thereof or a        sequence complementary thereto; more particularly, the        nucleotide sequence beginning at the nucleotide in position 1        and ending at the nucleotide in position 3529 of the nucleotide        sequence of SEQ ID No 1 or a variant thereof or a sequence        complementary thereto;    -   2) the nucleotide sequence beginning at the nucleotide in        position 4852 and ending at the nucleotide in position 15142 of        the nucleotide sequence of SEQ ID No 1 or a variant thereof or a        sequence complementary thereto;    -   3) the nucleotide sequence beginning at the nucleotide in        position 15366 and ending at the nucleotide in position 16276 of        the nucleotide sequence of SEQ ID No 1 or a variant thereof or a        sequence complementary thereto; and    -   4) the nucleotide sequence beginning at the nucleotide in        position 20560 and ending at the nucleotide in position 20966 of        the nucleotide sequence of SEQ ID No 1 or a variant thereof or a        sequence complementary thereto;

c) a purified or isolated nucleic acid comprising at least 8 consecutivenucleotides, preferably at least 15 of the nucleotide sequence beginningat the nucleotide in position 1 and ending at the nucleotide in position22 of the nucleotide sequence of SEQ ID No 5 or a variant thereof or asequence complementary thereto;

d) a purified or isolated nucleic acid comprising an exon of the APM1gene, a sequence complementary thereto or a variant thereof;

e) a purified or isolated nucleic acid comprising a combination of atleast two exons of the APM1 gene, or the sequences complementary theretowherein the polynucleotides are arranged within the nucleic acid, fromthe 5′ end to the 3′end of said nucleic acid, in the same order than inSEQ ID No 1;

f) a purified or isolated nucleic acid comprising the nucleotidesequence SEQ ID No 2 or the sequences complementary thereto or abiologically active fragment or a variant thereof;

g) a purified or isolated nucleic acid comprising the nucleotidesequence SEQ ID No 3, or the sequence complementary thereto or abiologically active fragment or a variant thereof;

h) a polynucleotide consisting of:

-   -   (1) a nucleic acid comprising a regulatory polynucleotide of SEQ        ID No 2 or the sequences complementary thereto or a biologically        active fragment or variant thereof;    -   (2) a polynucleotide encoding a desired polypeptide or nucleic        acid; or    -   (3) optionally, a nucleic acid comprising a regulatory        polynucleotide of SEQ ID No 3, or the sequence complementary        thereto or a biologically active fragment or variant thereof;        and

i) a DNA construct as described previously in the present specification.

The transgenic animals of the invention thus contain specific sequencesof exogenous genetic material such as the nucleotide sequences describedabove in detail.

A further transgenic animals according to the invention contains intheir somatic cells and/or in their germ line cells a polynucleotidecomprising a biallelic marker selected from the group consisting ofA1A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13, A14, A15, A16,A17, A18, A19, A20, A21, A22, A23, A24, A25 and A26, and the complementsthereof; optionally, wherein said APM1-related biallelic marker isselected from the group consisting of A1, A2, and A7 or the groupconsisting of A4 and A8.

In a first preferred embodiment, these transgenic animals may be goodexperimental models in order to study the diverse pathologies related tocell differentiation, in particular concerning the transgenic animalswithin the genome of which has been inserted one or several copies of apolynucleotide encoding a native APM1 protein, or alternatively a mutantAPM1 protein.

In a second preferred embodiment, these transgenic animals may express adesired polypeptide of interest under the control of the regulatorypolynucleotides of the APM1 gene, leading to good yields in thesynthesis of this protein of interest, and eventually a tissue specificexpression of this protein of interest.

The design of the transgenic animals of the invention may be madeaccording to the conventional techniques well known from the one skilledin the art. For more details regarding the production of transgenicanimals, and specifically transgenic mice, it may be referred to U.S.Pat. No. 4,873,191, issued Oct. 10, 1989, U.S. Pat. No. 5,464,764 issuedNov. 7, 1995 and U.S. Pat. No. 5,789,215, issued Aug. 4, 1998, thesedocuments being herein incorporated by reference to disclose methodsproducing transgenic mice.

Transgenic animals of the present invention are produced by theapplication of procedures which result in an animal with a genome thathas incorporated exogenous genetic material. The procedure involvesobtaining the genetic material, or a portion thereof, which encodeseither a APM1 coding sequence, a APM1 regulatory polynucleotide or a DNAsequence encoding a APM1 antisense polynucleotide such as described inthe present specification.

A recombinant polynucleotide of the invention is inserted into anembryonic or ES stem cell line. The insertion is preferably made usingelectroporation, such as described by Thomas et al. (1987). The cellssubjected to electroporation are screened (e.g. by selection viaselectable markers, by PCR or by Southern blot analysis) to findpositive cells which have integrated the exogenous recombinantpolynucleotide into their genome, preferably via an homologousrecombination event. An illustrative positive-negative selectionprocedure that may be used according to the invention is described byMansour et al. (1988).

Then, the positive cells are isolated, cloned and injected into 3.5 daysold blastocysts from mice, such as described by Bradley (1987). Theblastocysts are then inserted into a female host animal and allowed togrow to term.

Alternatively, the positive ES cells are brought into contact withembryos at the 2.5 days old 8-16 cell stage (morulae) such as describedby Wood et al. (1993) or by Nagy et al. (1993), the ES cells beinginternalized to colonize extensively the blastocyst including the cellswhich will give rise to the germ line.

The offspring of the female host are tested to determine which animalsare transgenic e.g. include the inserted exogenous DNA sequence andwhich are wild-type.

Thus, the present invention also concerns a transgenic animal containinga nucleic acid, a recombinant expression vector or a recombinant hostcell according to the invention.

Recombinant Cell Lines Derived from the Transgenic Animals of theInvention.

A further object of the invention consists of recombinant host cellsobtained from a transgenic animal described herein. In one embodimentthe invention encompasses cells derived from non-human host mammals andanimals comprising a recombinant vector of the invention or an APM1 genedisrupted by homologous recombination with a knock out vector.

Recombinant cell lines may be established in vitro from cells obtainedfrom any tissue of a transgenic animal according to the invention, forexample by transfection of primary cell cultures with vectors expressingonc-genes such as SV40 large T antigen, as described by Chou (1989) andShay et al. (1991).

Method for Producing an APM1 Polypeptide

It is now easy to produce proteins in high amounts by geneticengineering techniques through expression vectors such as plasmids,phages or phagemids. The polynucleotide that codes for the APM1 proteinis inserted in an appropriate expression vector in order to produce thepolypeptide of interest in vitro.

Thus, the present invention also concerns a method for a APM1 protein,and especially a polypeptide of SEQ ID No 6, wherein said methodcomprises:

-   -   a) culturing, in an appropriate culture medium, a cell host        previously transformed or transfected with the recombinant        vector comprising a nucleic acid encoding the APM1 protein;    -   b) harvesting the culture medium thus conditioned or lyse the        cell host, for example by sonication or by an osmotic shock;    -   c) separating or purifying, from the said culture medium, or        from the pellet of the resultant host cell lysate the thus        produced polypeptide of interest; and    -   d) optionally characterizing the produced polypeptide of        interest.

In a specific embodiment of the above method, the nucleic acid codingfor the APM1 protein is inserted in an appropriate vector, optionallyafter an appropriate cleavage of this amplified nucleic acid with one orseveral restriction endonucleases. In a preferred embodiment, thenucleic acid encoding for the APM1 protein is selected from a groupconsisting of SEQ ID No 1 or a fragment thereof and SEQ ID No 5. In afurther embodiment, the nucleic acid encoding for the APM1 proteincomprises an allele of at least one of the biallelic markers A1 to A26.The nucleic acid coding for the APM1 protein may be the resultingproduct of an amplification reaction using a pair of primers accordingto the invention (by SDA, TAS, 3SR NASBA, TMA etc.).

The polypeptides according to the invention may be characterized bybinding onto an immunoaffinity chromatography column on which polyclonalor monoclonal antibodies directed to a polypeptide of SEQ ID No 6 havepreviously been immobilized.

The polypeptides or peptides thus obtained may be purified, for exampleby high performance liquid chromatography, such as reverse phase and/orcationic exchange HPLC, as described by Rougeot et al. (1994). Thereason to prefer this kind of peptide or protein purification is thelack of byproducts found in the elution samples which renders theresultant purified protein or peptide more suitable for a therapeuticuse.

Method for Screening Substances Interacting with the RegulatorySequences of the APM1 Gene.

The present invention also concerns a method for screening substances ormolecules that are able to interact with the regulatory sequences of theAPM1 gene, such as for example promoter or enhancer sequences.

Nucleic acids encoding proteins which are able to interact with theregulatory sequences of the APM1 gene, more particularly a nucleotidesequence selected from the group consisting of the polynucleotides ofSEQ ID Nos 2 and 3 or a fragment or variant thereof, and preferably avariant comprising one of the biallelic markers of the invention, may beidentified by using a one-hybrid system, such as that described in thebooklet enclosed in the Matchmaker One-Hybrid System kit from Clontech(Catalog Ref. n^(o) K1603-1), the technical teachings of which areherein incorporated by reference. Briefly, the target nucleotidesequence is cloned upstream of a selectable reporter sequence and theresulting DNA construct is integrated in the yeast genome (Saccharomycescerevisiae). The yeast cells containing the reporter sequence in theirgenome are then transformed with a library consisting of fusionmolecules between cDNAs encoding candidate proteins for binding onto theregulatory sequences of the APM1 gene and sequences encoding theactivator domain of a yeast transcription factor such as GAL4. Therecombinant yeast cells are plated in a culture broth for selectingcells expressing the reporter sequence. The recombinant yeast cells thusselected contain a fusion protein that is able to bind onto the targetregulatory sequence of the APM1 gene. Then, the cDNAs encoding thefusion proteins are sequenced and may be cloned into expression ortranscription vectors in vitro. The binding of the encoded polypeptidesto the target regulatory sequences of the APM1 gene may be confirmed bytechniques familiar to the one skilled in the art, such as gelretardation assays or DNAse protection assays.

Gel retardation assays may also be performed independently in order toscreen candidate molecules that are able to interact with the regulatorysequences of the APM1 gene, such as described by Fried and Crothers(1981), Garner and Revzin (1981) and Dent and Latchman (1993), theteachings of these publications being herein incorporated by reference.These techniques are based on the principle according to which a DNAfragment which is bound to a protein migrates slower than the sameunbound DNA fragment. Briefly, the target nucleotide sequence islabeled. Then the labeled target nucleotide sequence is brought intocontact with either a total nuclear extract from cells containingtranscription factors, or with different candidate molecules to betested. The interaction between the target regulatory sequence of theAPM1 gene and the candidate molecule or the transcription factor isdetected after gel or capillary electrophoresis through a retardation inthe migration.

Method for Screening Ligands that Modulate the Expression of the APM1Gene.

Another subject of the present invention is a method for screeningmolecules that modulate the expression of the APM1 protein. Such ascreening method comprises:

-   -   a) cultivating a prokaryotic or an eukaryotic cell that has been        transfected with a nucleotide sequence encoding the APM1 protein        or a variant or a fragment thereof, placed under the control of        its own promoter;    -   b) bringing into contact the cultivated cell with a molecule to        be tested; and    -   c) quantifying the expression of the APM1 protein or a variant        or a fragment thereof.

In an embodiment, the nucleotide sequence encoding the APM1 protein or avariant or a fragment thereof comprises an allele of at least one of thebiallelic markers A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12,A13, A14, A15, A16, A17, A18, A19, A20, A21, A22, A23, A24, A25 and A26,and the complements thereof; optionally, wherein said APM1-relatedbiallelic marker is selected from the group consisting of A1, A2, and A7or the group consisting of A4 and A8.

Using DNA recombination techniques well known by the one skill in theart, the APM1 protein encoding DNA sequence is inserted into anexpression vector, downstream from its promoter sequence. As anillustrative example, the promoter sequence of the APM1 gene iscontained in the nucleic acid of SEQ ID No 2.

The quantification of the expression of the APM1 protein may be realizedeither at the mRNA level or at the protein level. In the latter case,polyclonal or monoclonal antibodies may be used to quantify the amountsof the APM1 protein that have been produced, for example in an ELISA ora RIA assay.

In a preferred embodiment, the quantification of the APM1 mRNA isrealized by a quantitative PCR amplification of the cDNA obtained by areverse transcription of the total mRNA of the cultivatedAPM1-transfected host cell, using a pair of primers specific for APM1.

The present invention also concerns a method for screening substances ormolecules that are able to increase, or in contrast to decrease, thelevel of expression of the APM1 gene. Such a method may allow the oneskilled in the art to select substances exerting a regulating effect onthe expression level of the APM1 gene and which may be useful as activeingredients included in pharmaceutical compositions for treatingpatients suffering from deficiencies in the regulation of expression ofthe APM1 gene, particularly patients suffering from obesity.

The invention also features a method for screening a candidate substanceor molecule for modulation of the expression of the APM1 gene,comprising:

-   -   a) providing a recombinant cell host containing a nucleic acid,        wherein said nucleic acid comprises a nucleotide sequence of SEQ        ID No 2 or a biologically active fragment or variant thereof        located upstream a polynucleotide encoding a detectable protein;    -   b) obtaining a candidate substance; and    -   c) determining the ability of the candidate substance to        modulate the expression levels of the polynucleotide encoding        the detectable protein.

In a specific embodiment, the nucleic acid comprising a nucleotidesequence of SEQ ID No 2 or a biologically active fragment or variantthereof includes a biallelic marker selected from the group consistingof A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13, A14, A15,A16, A17, A18, A19, A20, A21, A22, A23, A24, A25 and A26 or thecomplements thereof; optionally, wherein said APM1-related biallelicmarker is selected from the group consisting of A1, A2, and A7 or thegroup consisting of A4 and A8.

In a further embodiment, the nucleic acid comprising the nucleotidesequence of SEQ ID No 2 or a biologically active fragment or variantthereof also includes a 5′UTR region of the APM1 cDNA of SEQ ID No 5, orone of its biologically active fragments or variants thereof.

Among the preferred polynucleotides encoding a detectable protein, theremay be cited polynucleotides encoding beta galactosidase, greenfluorescent protein (GFP) and chloramphenicol acetyl transferase (CAT).

The invention also pertains to kits useful for performing thehereinbefore described screening method. Preferably, such kits comprisea recombinant vector that allows the expression of a nucleotide sequenceof SEQ ID No 2 or a biologically active fragment or variant thereoflocated upstream and operably linked to a polynucleotide encoding adetectable protein or the APM1 protein or a fragment or a variantthereof.

In another embodiment, a method for the screening of a candidatesubstance or molecule for modulation of the expression of the APM1 genecomprises:

-   -   a) providing a recombinant host cell containing a nucleic acid,        wherein said nucleic acid comprises a 5′UTR sequence of the APM1        cDNA of SEQ ID No 5, or one of its biologically active fragments        or variants, the 5′UTR sequence or its biologically active        fragment or variant being operably linked to a polynucleotide        encoding a detectable protein;    -   b) obtaining a candidate substance; and    -   c) determining the ability of the candidate substance to        modulate the expression levels of the polynucleotide encoding        the detectable protein.

In a specific embodiment of the above screening method, the nucleic acidthat comprises a nucleotide sequence selected from the group consistingof the 5′UTR sequence of the APM1 cDNA of SEQ ID No 5 or one of itsbiologically active fragments or variants, includes a promoter sequencewhich is endogenous with respect to the APM1 5′UTR sequence.

In another specific embodiment of the above screening method, thenucleic acid that comprises a nucleotide sequence selected from thegroup consisting of the 5′UTR sequence of the APM1 cDNA of SEQ ID No 5or one of its biologically active fragments or variants, includes apromoter sequence which is exogenous with respect to the APM1 5′UTRsequence defined therein.

In a further preferred embodiment, the nucleic acid comprising the5′-UTR sequence of the APM1 cDNA or SEQ ID NO 5 or the biologicallyactive fragments thereof includes a biallelic marker selected from thegroup consisting of A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12,A13, A14, A15, A16, A17, A18, A19, A20, A21, A22, A23, A24, A25 and A26or the complements thereof; optionally, wherein said APM1-relatedbiallelic marker is selected from the group consisting of A1, A2, and A7or the group consisting of A4 and A8.

The invention further deals with a kit for the screening of a candidatesubstance modulating the expression of the APM1 gene, wherein said kitcomprises a recombinant vector that comprises a nucleic acid including a5′UTR sequence of the APM1 cDNA of SEQ ID No 5, or one of theirbiologically active fragments or variants, the 5′UTR sequence or itsbiologically active fragment or variant being operably linked to apolynucleotide encoding a detectable protein.

For the design of suitable recombinant vectors useful for performing thescreening methods described above, it will be referred to the section ofthe present specification wherein the preferred recombinant vectors ofthe invention are detailed.

Expression levels and patterns of APM1 may be analyzed by solutionhybridization with long probes as described in International PatentApplication No. WO 97/05277, which is hereby incorporated herein byreference in its entirety including any figures, tables, or references.Briefly, the APM1 cDNA or the APM1 genomic DNA described above, orfragments thereof, is inserted at a cloning site immediately downstreamof a bacteriophage (T3, T7 or SP6) RNA polymerase promoter to produceantisense RNA. Preferably, the APM1 insert comprises at least 100 ormore consecutive nucleotides of the genomic DNA sequence or the cDNAsequences. The plasmid is linearized and transcribed in the presence ofribonucleotides comprising modified ribonucleotides (i.e. biotin-UTP andDIG-UTP). An excess of this doubly labeled RNA is hybridized in solutionwith mRNA isolated from cells or tissues of interest. The hybridizationsare performed under standard stringent conditions (40-50° C. for 16hours in an 80% formamide, 0. 4 M NaCl buffer, pH 7-8). The unhybridizedprobe is removed by digestion with ribonucleases specific forsingle-stranded RNA (i.e. RNases CL3, T1, Phy M, U2 or A). The presenceof the biotin-UTP modification enables capture of the hybrid on amicrotitration plate coated with streptavidin. The presence of the DIGmodification enables the hybrid to be detected and quantified by ELISAusing an anti-DIG antibody coupled to alkaline phosphatase.

Quantitative analysis of APM1 gene expression may also be performedusing arrays. As used herein, the term “array” means a one dimensional,two dimensional, or multidimensional arrangement of a plurality ofnucleic acids of sufficient length to permit specific detection ofexpression of mRNAs capable of hybridizing thereto. For example, thearrays may contain a plurality of nucleic acids derived from genes whoseexpression levels are to be assessed. The arrays may include the APM1genomic DNA, the APM1 cDNA sequences or the sequences complementarythereto or fragments thereof, particularly those comprising at least oneof the biallelic markers according the present invention, preferably atleast one of the biallelic markers A1 to A26. Preferably, the fragmentsare at least 15 nucleotides in length. In other embodiments, thefragments are at least 25 nucleotides in length. In some embodiments,the fragments are at least 50 nucleotides in length. More preferably,the fragments are at least 100 nucleotides in length. In anotherpreferred embodiment, the fragments are more than 100 nucleotides inlength. In some embodiments the fragments may be more than 500nucleotides in length.

For example, quantitative analysis of APM1 gene expression may beperformed with a complementary DNA microarray as described by Schena etal. (1995 and 1996). Full length APM1 cDNAs or fragments thereof areamplified by PCR and arrayed from a 96-well microtiter plate ontosilylated microscope slides using high-speed robotics. Printed arraysare incubated in a humid chamber to allow rehydration of the arrayelements and rinsed, once in 0. 2% SDS for 1 min, twice in water for 1min and once for 5 min in sodium borohydride solution. The arrays aresubmerged in water for 2 min at 95° C., transferred into 0. 2% SDS for 1min, rinsed twice with water, air dried and stored in the dark at 25° C.

Cell or tissue mRNA is isolated or commercially obtained and probes areprepared by a single round of reverse transcription. Probes arehybridized to 1 cm² microarrays under a 14×14 mm glass coverslip for6-12 hours at 60° C. Arrays are washed for 5 min at 25° C. in lowstringency wash buffer (1×SSC/0. 2% SDS), then for 10 min at roomtemperature in high stringency wash buffer (0. 1×SSC/0. 2% SDS). Arraysare scanned in 0. 1×SSC using a fluorescence laser scanning devicefitted with a custom filter set. Accurate differential expressionmeasurements are obtained by taking the average of the ratios of twoindependent hybridizations.

Quantitative analysis of APM1 gene expression may also be performed withfull length APM1 cDNAs or fragments thereof in complementary DNA arraysas described by Pietu et al. (1996). The full length APM1 cDNA orfragments thereof is PCR amplified and spotted on membranes. Then, mRNAsoriginating from various tissues or cells are labeled with radioactivenucleotides. After hybridization and washing in controlled conditions,the hybridized mRNAs are detected by phospho-imaging or autoradiography.Duplicate experiments are performed and a quantitative analysis ofdifferentially expressed mRNAs is then performed.

Alternatively, expression analysis using the APM1 genomic DNA, the APM1cDNA, or fragments thereof can be done through high density nucleotidearrays as described by Lockhart et al. (1996) and Sosnowsky et al.(1997). Oligonucleotides of 15-50 nucleotides from the sequences of theAPM1 genomic DNA, the APM1 cDNA sequences particularly those comprisingat least one of biallelic markers according the present invention,preferably at least one biallelic marker selected from the groupconsisting of A1 to A26, or the sequences complementary thereto, aresynthesized directly on the chip (Lockhart et al., supra) or synthesizedand then addressed to the chip (Sosnowski et al., supra). Preferably,the oligonucleotides are about 20 nucleotides in length.

APM1 cDNA probes labeled with an appropriate compound, such as biotin,digoxigenin or fluorescent dye, are synthesized from the appropriatemRNA population and then randomly fragmented to an average size of 50 to100 nucleotides. The said probes are then hybridized to the chip. Afterwashing as described in Lockhart et al., supra and application ofdifferent electric fields (Sosnowsky et al., 1997), the dyes or labelingcompounds are detected and quantified. Duplicate hybridizations areperformed. Comparative analysis of the intensity of the signaloriginating from cDNA probes on the same target oligonucleotide indifferent cDNA samples indicates a differential expression of APM1 mRNA.

Methods for Inhibiting the Expression of an APM1 Gene

Other therapeutic compositions according to the present inventioncomprise advantageously an oligonucleotide fragment of the nucleicsequence of APM1 as an antisense tool or a triple helix tool thatinhibits the expression of the corresponding APM1 gene. A preferredfragment of the nucleic sequence of APM1 comprises an allele of at leastone of the biallelic markers A1 to A26.

Antisense Approach

Preferred methods using antisense polynucleotide according to thepresent invention are the procedures described by Sczakiel et al.(1995).

Preferably, the antisense tools are chosen among the polynucleotides(15-200 bp long) that are complementary to the 5′end of the APM1 mRNA.In another embodiment, a combination of different antisensepolynucleotides complementary to different parts of the desired targetedgene are used.

Preferred antisense polynucleotides according to the present inventionare complementary to a sequence of the mRNAs of APM1 that containseither the translation initiation codon ATG or a splicing donor oracceptor site.

The antisense nucleic acids should have a length and melting temperaturesufficient to permit formation of an intracellular duplex havingsufficient stability to inhibit the expression of the APM1 mRNA in theduplex. Strategies for designing antisense nucleic acids suitable foruse in gene therapy are disclosed in Green et al., (1986) and Izant andWeintraub, (1984), the disclosures of which are incorporated herein byreference.

In some strategies, antisense molecules are obtained by reversing theorientation of the APM1 coding region with respect to a promoter so asto transcribe the opposite strand from that which is normallytranscribed in the cell. The antisense molecules may be transcribedusing in vitro transcription systems such as those which employ T7 orSP6 polymerase to generate the transcript. Another approach involvestranscription of APM1 antisense nucleic acids in vivo by operablylinking DNA containing the antisense sequence to a promoter in asuitable expression vector.

Alternatively, suitable antisense strategies are those described byRossi et al. (1991), in the International Applications Nos. WO 94/23026,WO 95/04141, WO 92/18522 and in the European Patent Application No. EP 0572 287 A2

An alternative to the antisense technology that is used according to thepresent invention consists in using ribozymes that will bind to a targetsequence via their complementary polynucleotide tail and that willcleave the corresponding RNA by hydrolyzing its target site (namely“hammerhead ribozymes”). Briefly, the simplified cycle of a hammerheadribozyme consists of (1) sequence specific binding to the target RNA viacomplementary antisense sequences; (2) site-specific hydrolysis of thecleavable motif of the target strand; and (3) release of cleavageproducts, which gives rise to another catalytic cycle. Indeed, the useof long-chain antisense polynucleotide (at least 30 bases long) orribozymes with long antisense arms are advantageous. A preferreddelivery system for antisense ribozyme is achieved by covalently linkingthese antisense ribozymes to lipophilic groups or to use liposomes as aconvenient vector. Preferred antisense ribozymes according to thepresent invention are prepared as described by Sczakiel et al. (1995),the specific preparation procedures being referred to in said articlebeing herein incorporated by reference.

Triple Helix Approach

The APM1 genomic DNA may also be used to inhibit the expression of theAPM1 gene based on intracellular triple helix formation.

Triple helix oligonucleotides are used to inhibit transcription from agenome. They are particularly useful for studying alterations in cellactivity when it is associated with a particular gene.

Similarly, a portion of the APM1 genomic DNA can be used to study theeffect of inhibiting APM1 transcription within a cell. Traditionally,homopurine sequences were considered the most useful for triple helixstrategies. However, homopyrimidine sequences can also inhibit geneexpression. Such homopyrimidine oligonucleotides bind to the majorgroove at homopurine:homopyrimidine sequences. Thus, both types ofsequences from the APM1 genomic DNA are contemplated within the scope ofthis invention.

To carry out gene therapy strategies using the triple helix approach,the sequences of the APM1 genomic DNA are first scanned to identify10-mer to 20-mer homopyrimidine or homopurine stretches which could beused in triple-helix based strategies for inhibiting APM1 expression.Following identification of candidate homopyrimidine or homopurinestretches, their efficiency in inhibiting APM1 expression is assessed byintroducing varying amounts of oligonucleotides containing the candidatesequences into tissue culture cells which express the APM1 gene.

The oligonucleotides can be introduced into the cells using a variety ofmethods known to those skilled in the art, including but not limited tocalcium phosphate precipitation, DEAE-Dextran, electroporation,liposome-mediated transfection or native uptake.

Treated cells are monitored for altered cell function or reduced APM1expression using techniques such as Northern blotting, RNase protectionassays, or PCR based strategies to monitor the transcription levels ofthe APM1 gene in cells which have been treated with the oligonucleotide.

The oligonucleotides which are effective in inhibiting gene expressionin tissue culture cells may then be introduced in vivo using thetechniques described above in the antisense approach at a dosagecalculated based on the in vitro results, as described in antisenseapproach.

In some embodiments, the natural (beta) anomers of the oligonucleotideunits can be replaced with alpha anomers to render the oligonucleotidemore resistant to nucleases. Further, an intercalating agent such asethidium bromide, or the like, can be attached to the 3′ end of thealpha oligonucleotide to stabilize the triple helix. For information onthe generation of oligonucleotides suitable for triple helix formationsee Griffin et al. (1989), which is hereby incorporated by thisreference.

Throughout the application it is specifically contemplated that in eachcase of a list of biallelic markers, or probes, or primers, that list isalso envisioned to include all but any one of its members, or all butany two, or all but any three, until there is only one remaining member.The examples that follow are exemplary only, and not to be taken asmeant to limit the invention in any way.

EXAMPLES Example 1 Identification of Biallelic Markers—DNA Extraction

Donors were unrelated and healthy. They presented a sufficient diversityfor being representative of a French heterogeneous population. The DNAfrom 100 individuals was extracted and tested for the detection of thebiallelic markers.

30 mL of peripheral venous blood were taken from each donor in thepresence of EDTA. Cells (pellet) were collected after centrifugation for10 minutes at 2000 rpm. Red cells were lysed by a lysis solution (50 mLfinal volume: 10 mM Tris pH7.6; 5 mM MgCl₂; 10 mM NaCl). The solutionwas centrifuged (10 minutes, 2000 rpm) as many times as necessary toeliminate the residual red cells present in the supernatant, afterresuspension of the pellet in the lysis solution.

The pellet of white cells was lysed overnight at 42° C. with 3.7 ml oflysis solution composed of:

3 mL TE 10-2 (Tris-HCl 10 mM, EDTA 2 mM)/NaCl 0 4 M

200 μL SDS 10%

500 μL K-proteinase (2 mg K-proteinase in TE 10-2/NaCl 0.4 M).

For the extraction of proteins, 1 mL saturated NaCl (6M) (1/3.5 v/v) wasadded. After vigorous agitation, the solution was centrifuged for 20minutes at 10000 rpm.

For the precipitation of DNA, 2 to 3 volumes of 100% ethanol were addedto the previous supernatant, and the solution was centrifuged for 30minutes at 2000 rpm. The DNA solution was rinsed three times with 70%ethanol to eliminate salts, and was centrifuged for 20 minutes at 2000rpm. The pellet was dried at 37° C., and was resuspended in 1 mL TE 10-1or 1 mL water. The DNA concentration was evaluated by measuring the ODat 260 nm (1 unit OD=50 μg/mL DNA).

To determine the presence of proteins in the DNA solution, the OD 260/OD280 ratio was determined. Only DNA preparations having a OD 260/OD 280ratio between 1.8 and 2 were used in the subsequent examples describedbelow.

The pool was constituted by mixing equivalent quantities of DNA fromeach individual.

Example 2 Identification of Biallelic Markers: Amplification of GenomicDNA by PCR

The amplification of specific genomic sequences of the DNA samples ofexample 1 was carried out on the pool of DNA obtained previously. Inaddition, 50 individual samples were similarly amplified.

PCR assays were performed using the following protocol: Final volume 25μL DNA 2 ng/μL MgCl₂ 2 mM dNTP (each) 200 μM primer (each) 2.9 ng/μLAmpli Taq Gold DNA polymerase 0.05 unit/μL PCR buffer (10x = 0.1 MTrisHCl pH8.3 0.5M KCl) 1x

Each pair of first primers was designed using the sequence informationof the APM1 gene disclosed herein and the OSP software (Hillier & Green,1991). This first pair of primers was about 20 nucleotides in length andhad the sequence corresponding to the SEQ ID positions disclosed inTable 1 in the columns labeled PU and RP. TABLE 1 Position of theamplicon in SEQ Position of PU Position of RP Amplicon ID 1 PU primer inSEQ ID 1 RP primer in SEQ ID 1  9-27 3528-3946 B1 3528-3545 C1 3928-3946 9-28 3892-4321 B2 3892-3911 C2 4303-4321   99-14402 4155-4602 B34155-4175 C3 4584-4602  9-29 4223-4642 B4 4223-4242 C4 4623-4642  9-304599-5027 B5 4599-4618 C5 5008-5027   99-14387 10990-11442 B610990-11008 C6 11423-11442   99-14389 12472-12966 B7 12472-12491 C712946-12966  9-12 15073-15520 B8 15073-15092 C8 15503-15520  9-1315131-15551 B9 15131-15150 C9 15532-15551   99-14405 15759-16211 B1015759-15776 C10 16191-16211  9-14 16233-16652 B11 16233-16251 C1116633-16652  9-15 16604-17025 B12 16604-16621 C12 17006-17025  9-1616982-17402 B13 16982-17001 C13 17384-17402  9-17 17216-17517 B1417216-17233 C14 17498-17517  9-18 17300-17503 B15 17300-17317 C1517486-17503 17-30  730-1137 B16 730-752 C16 1117-1137 17-31 4798-5385B17 4798-4819 C17 5364-5385 17-32 10614-11114 B18 10614-10635 C1811093-11114 17-33 13843-14517 B19 13843-13865 C19 14496-14517 17-3413843-14859 C20 14839-14859 17-35 14745-15219 B20 14745-14766 C2115199-15219 17-36 15381-15987 B21 15381-15402 C22 15966-15987 17-3717201-18261 B22 17201-17222 C23 18240-18261 17-38 18141-19336 B2318141-18163 C24 19314-19336

Preferably, the primers contained a common oligonucleotide tail upstreamof the specific bases targeted for amplification which was useful forsequencing.

Primers PU contain the following additional PU 5′ sequence:TGTAAAACGACGGCCAGT; primers RP contain the following RP 5′ sequence:CAGGAAACAGCTATGACC. The primer containing the additional PU 5′ sequenceis listed in SEQ ID No 7. The primer containing the additional RP 5′sequence is listed in SEQ ID No 8.

The synthesis of these primers was performed following thephosphoramidite method, on a GENSET UFPS 24.1 synthesizer.

DNA amplification was performed on a Genius II thermocycler. Afterheating at 95° C. for 10 min, 40 cycles were performed. Each cyclecomprised: 30 sec at 95° C., 54° C. min, and 30 sec at 72° C. For finalelongation, 10 min at 72° C. ended the amplification. The quantities ofthe amplification products obtained were determined on 96-wellmicrotiter plates, using a fluorometer and Picogreen as intercalantagent (Molecular Probes).

Example 3 Identification of Biallelic Markers—Sequencing of AmplifiedGenomic DNA and Identification of Polymorphisms

The sequencing of the amplified DNA obtained in example 2 was carriedout on ABI 377 sequencers. The sequences of the amplification productswere determined using automated dideoxy terminator sequencing reactionswith a dye terminator cycle sequencing protocol. The products of thesequencing reactions were run on sequencing gels and the sequences weredetermined using gel image analysis (ABI Prism DNA Sequencing Analysissoftware (2.1.2 version) and the above mentioned proprietary “Trace”basecaller).

The sequence data were further evaluated using the above mentionedpolymorphism analysis software designed to detect the presence ofbiallelic markers among the pooled amplified fragments. The polymorphismsearch was based on the presence of superimposed peaks in theelectrophoresis pattern resulting from different bases occurring at thesame position as described previously.

15 fragments of amplification were analyzed. In 5 of these segments, 8biallelic markers were detected. The localization of these biallelicmarkers are as shown in Table 2. TABLE 2 Biallelic Localization inMarker position in Amplicon marker Marker Name APM1 gene PolymorphismSEQ ID No 1  9-27 A1  9-27/261 5′regulatory Allele 1: G 3787 regionAllele 2: C 99-14387 A2  99-14387/129 Intron 1 Allele 1: A 11118 Allele2: C  9-12 A3  9-12/48 Intron 1 Allele 1: T 15120 Allele 2: C  9-12 andA4  9-12/124 or Exon 2 Allele 1: T 15196  9-13 9-13/66 Allele 2: G  9-12and A5  9-12/355 or Intron 2 Allele 1: G 15427  9-13 9-13/297 Allele 2:T  9-12 and A6  9-12/428 or Intron 2 Allele 1: A 15500  9-13 9-13/370Allele 2: G 99-14405 A7  99-14405/105 Intron 2 Allele 1: G 15863 Allele2: A  9-16 A8  9-16/189 Exon 3 Allele 1: A 17170 Allele 2: Del 17-30 A9 17-30-216 5′ regulatory Allele 1: A 945 region Allele 2: G  9-27 A109-27-211 5′ regulatory Allele 1: A 3738 region Allele 2: G  9-27 A119-27-246 5′ regulatory Allele 1: A 3773 region Allele 2: G 17-31 A1217-31-298 Intron 1 Allele 1: A 5095 Allele 2: G 17-31 A13 17-31-413Intron 1 Allele 1: C 5210 Allele 2: T 17-32 A14 17-32-24 Intron 1 Allele1: T 10637 Allele 2: C 99-14387 A15 99-14387-50 Intron 1 Allele 1: A11039 Allele 2: C 99-14387 A16 99-14387-199 Intron 1 Allele 1: A 11188Allele 2: G 17-33 A17 17-33- Intron 1 Allele 1: no 13973 TGAGACT insertAllele 2: TGAGACT insert 17-34 A18 17-34-860 Intron 1 Allele 1: A 14702Allele 2: G 17-34 A19 17-34-915 Intron 1 Allele 1: A 14757 Allele 2: G17-35 A20 17-35-71 Intron 1 Allele 1: C 14815 Allele 2: T 17-35 A2117-35-306 Intron 1 Allele 1: G 15050 Allele 2: T 17-36 A22 17-36-47Intron 2 Allele 1: G 15680 Allele 2: C 17-36 A23 17-36-120 Intron 2Allele 1: C 15790 Allele 2: T 17-37 A24 17-37-629 Exon 3 Allele 1: A17829 Allele 2: G 17-37 A25 17-37-811 Exon 3 Allele 1: A 18011 Allele 2:G 17-38 A26 17-38-349 Exon 3 Allele 1: C 18489 Allele 2: T

Example 4 Validation of the Polymorphisms Through Microsequencing

The biallelic markers identified in example 3 were further confirmed andtheir respective frequencies were determined through microsequencing.Microsequencing was carried out for each individual DNA sample describedin Example 1.

Amplification from genomic DNA of individuals was performed by PCR asdescribed above for the detection of the biallelic markers with the sameset of PCR primers (Table 1).

The preferred primers used in microsequencing were about 19 nucleotidesin length and hybridized just upstream of the considered polymorphicbase. According to the invention, the primers used in microsequencingare detailed in Table 3. TABLE 3 Position of mis 1 in Position of mis 2Marker Name Marker Mis. 1 SEQ ID No 1 Mis. 2 In SEQ ID No 1 9-27/261 A1D1 3768-3786 E1 3788-3806 99-14387/129 A2 D2 11099-11117 E2 11119-111379-12/48 A3 D3 15101-15119 E3 15121-15139 9-12/124 or A4 D4 15177-15195E4 15197-15215 9-13/66 9-12/355 or A5 D5 15408-15426 E5 15428-154469-13/297 9-12/428 or A6 D6 15481-15499 E6 15501-15519 9-13/37099-14405/105 A7 D7 15844-15862 E7 15864-15882 9-16/189 A8 D8 17151-17169E8 17171-17189 17-30-216 A9 D9 926-944 E9 946-964 9-27-211 A10 D103719-3737 E10 3739-3757 9-27-246 A11 D11 3754-3772 E11 3774-379217-31-298 A12 D12 5076-5094 E12 5096-5114 17-31-413 A13 D13 5191-5209E13 5211-5229 17-32-24 A14 D14 10618-10636 E14 10638-10656 99-14387-50A15 D15 11020-11038 E15 11040-11058 99-14387-199 A16 D16 11169-11187 E1611189-11207 17-33- A17 D17 13954-13972 E17 13974-13992 TGAGACT 17-34-860A18 D18 14683-14701 E18 14703-14721 17-34-915 A19 D19 14738-14756 E1914758-14776 17-35-71 A20 D20 14796-14814 E20 14816-14834 17-35-306 A21D21 15031-15049 E21 15051-15069 17-36-47 A22 D22 15661-15679 E2215681-15699 17-36-120 A23 D23 15771-15789 E23 15791-15809 17-37-629 A24D24 17810-17828 E24 17830-17848 17-37-811 A25 D25 17992-18010 E2518012-18030 17-38-349 A26 D26 18470-18488 E26 18490-18508

Mis 1 and Mis 2 refer to microsequencing primers that hybridize with thenon-coding strand of the APM1 gene and with the coding strand of theAPM1 gene, respectively.

The microsequencing reaction was performed as follows:

After purification of the amplification products, the microsequencingreaction mixture was prepared by adding, in a 20 μL final volume: 10pmol microsequencing oligonucleotide, 1 U Thermosequenase (AmershamE79000G), 1.25 μL Thermosequenase buffer (260 mM Tris HCl pH 9.5, 65 mMMgCl₂), and the two appropriate fluorescent ddNTPs (Perkin Elmer, DyeTerminator Set 401095) complementary to the nucleotides at thepolymorphic site of each biallelic marker tested, following themanufacturer's recommendations. After 4 minutes at 94° C., 20 PCR cyclesof 15 sec at 55° C., 5 sec at 72° C., and 10 sec at 94° C. were carriedout in a Tetrad PTC-225 thermocycler (MJ Research). The unincorporateddye terminators were then removed by ethanol precipitation. Samples werefinally resuspended in formamide-EDTA loading buffer and heated for 2min at 95° C. before being loaded on a polyacrylamide sequencing gel.The data were collected by an ABI PRISM 377 DNA sequencer and processedusing the GENESCAN software (Perkin Elmer).

Following gel analysis, data were automatically processed with softwarethat allows the determination of the alleles of biallelic markerspresent in each amplified fragment.

The software evaluates such factors as whether the intensities of thesignals resulting from the above microsequencing procedures are weak,normal, or saturated, or whether the signals are ambiguous. In addition,the software identifies significant peaks (according to shape and heightcriteria). Among the significant peaks, peaks corresponding to thetargeted site are identified based on their position. When twosignificant peaks are detected for the same position, each sample iscategorized classification as homozygous or heterozygous type based onthe height ratio.

Oligonucleotide probes may be used in genotyping biallelic markers byhybridization assays. The nucleic acid sample is contacted with one ormore allele specific oligonucleotide probes which, specificallyhybridize to one or more alleles associated with a detectable phenotype.The probes are 25-mers with an APM1-related biallelic marker in thecenter position. Probes used in the hybridization assay may include theprobes listed in Table 4. TABLE 4 Position range of probes in SEQ BMMarker Name ID No genomic Probes A1  9-27/261 3775 3799 P1 A299-14387/129 11106 11130 P2 A3  9-12/48 15108 15132 P3 A4  9-12/12415184 15208 P4 A5  9-12/355 15415 15439 P5 A6  9-12/428 15488 15512 P6A7 99-14405/105 15851 15875 P7 A8  9-16/189 17158 17182 P8 A9 17-30-216933 957 P9 A10  9-27-211 3726 3750 P10 A11  9-27-246 3761 3785 P11 A1217-31-298 5083 5107 P12 A13 17-31-413 5198 5222 P13 A14 17-32-24 1062510649 P14 A15 99-14387-50 11027 11051 P15 A16 99-14387-199 11176 11200P16 A17 17-33-TGAGACT 13961 13985 P17 A18 17-34-860 14690 14714 P18 A1917-34-915 14745 14769 P19 A20 17-35-71 14803 14827 P20 A21 17-35-30615038 15062 P21 A22 17-36-47 15668 15692 P22 A23 17-36-120 15778 15802P23 A24 17-37-629 17817 17841 P24 A25 17-37-811 17999 18023 P25 A2617-38-349 18477 18501 P26

Example 5 Preparation of Antibody Compositions to the 56-Glu Variant ofAPM1

Substantially pure protein or polypeptide is isolated from transfectedor transformed cells containing an expression vector encoding the APM1protein or a portion thereof. The concentration of protein in the finalpreparation is adjusted, for example, by concentration on an Amiconfilter device, to the level of a few micrograms/mL. Monoclonal orpolyclonal antibody to the protein can then be prepared as follows:

A. Monoclonal Antibody Production by Hybridoma Fusion

Monoclonal antibody to epitopes in the APM1 protein or a portion thereofcan be prepared from murine hybridomas according to the classical methodof Kohler, G. and Milstein, C., Nature 256:495 (1975) or derivativemethods thereof. Also see Harlow, E., and D. Lane. 1988. Antibodies ALaboratory Manual. Cold Spring Harbor Laboratory. pp. 53-242.

Briefly, a mouse is repetitively inoculated with a few micrograms of theAPM1 protein or a portion thereof over a period of a few weeks. Themouse is then sacrificed, and the antibody producing cells of the spleenisolated. The spleen cells are fused by means of polyethylene glycolwith mouse myeloma cells, and the excess unfused cells destroyed bygrowth of the system on selective media comprising aminopterin (HATmedia). The successfully fused cells are diluted and aliquots of thedilution placed in wells of a microtiter plate where growth of theculture is continued. Antibody-producing clones are identified bydetection of antibody in the supernatant fluid of the wells byimmunoassay procedures, such as ELISA, as originally described byEngvall, E., Meth. Enzymol. 70:419 (1980), and derivative methodsthereof. Selected positive clones can be expanded and their monoclonalantibody product harvested for use. Detailed procedures for monoclonalantibody production are described in Davis, L. et al. Basic Methods inMolecular Biology Elsevier, New York. Section 21-2.

B. Polyclonal Antibody Production by Immunization

Polyclonal antiserum containing antibodies to heterogeneous epitopes inthe APM1 protein or a portion thereof can be prepared by immunizingsuitable non-human animal with the APM1 protein or a portion thereof,which can be unmodified or modified to enhance immunogenicity. Asuitable non-human animal is preferably a non-human mammal is selected,usually a mouse, rat, rabbit, goat, or horse. Alternatively, a crudepreparation which has been enriched for APM1 concentration can be usedto generate antibodies. Such proteins, fragments or preparations areintroduced into the non-human mammal in the presence of an appropriateadjuvant (e.g. aluminum hydroxide, RIBI, etc.) which is known in theart. In addition the protein, fragment or preparation can be pretreatedwith an agent which will increase antigenicity, such agents are known inthe art and include, for example, methylated bovine serum albumin(mBSA), bovine serum albumin (BSA), Hepatitis B surface antigen, andkeyhole limpet hemocyanin (KLH). Serum from the immunized animal iscollected, treated and tested according to known procedures. If theserum contains polyclonal antibodies to undesired epitopes, thepolyclonal antibodies can be purified by immunoaffinity chromatography.

Effective polyclonal antibody production is affected by many factorsrelated both to the antigen and the host species. Also, host animalsvary in response to site of inoculations and dose, with both inadequateor excessive doses of antigen resulting in low titer antisera. Smalldoses (ng level) of antigen administered at multiple intradermal sitesappears to be most reliable. Techniques for producing and processingpolyclonal antisera are known in the art, see for example, Mayer andWalker (1987). An effective immunization protocol for rabbits can befound in Vaitukaitis, J. et al. J. Clin. Endocrinol. Metab. 33:988-991(1971).

Booster injections can be given at regular intervals, and antiserumharvested when antibody titer thereof, as determinedsemi-quantitatively, for example, by double immunodiffusion in agaragainst known concentrations of the antigen, begins to fall. See, forexample, Ouchterlony, O. et al., Chap. 19 in: Handbook of ExperimentalImmunology D. Wier (ed) Blackwell (1973). Plateau concentration ofantibody is usually in the range of 0.1 to 0.2 mg/ml of serum (about 12μM). Affinity of the antisera for the antigen is determined by preparingcompetitive binding curves, as described, for example, by Fisher, D.,Chap. 42 in: Manual of Clinical Immunology, 2d Ed. (Rose and Friedman,Eds.) Amer. Soc. For Microbiol., Washington, D.C. (1980).

Antibody preparations prepared according to either the monoclonal or thepolyclonal protocol are useful in quantitative immunoassays whichdetermine concentrations of antigen-bearing substances in biologicalsamples; they are also used semi-quantitatively or qualitatively toidentify the presence of antigen in a biological sample. The antibodiesmay also be used in therapeutic compositions for killing cellsexpressing the protein or reducing the levels of the protein in thebody.

Example 6 Association between Apm1 Markers and Characteristics in anObese Population

Materials and Methods:

Patients

Subjects of this study were unrelated and lived in the region of Paris.Obese girls were severely obese since early childhood and exceeded the98^(th) percentile of normal growth curves. Blood sampling and testingof these subjects were performed prior to any weight reductiontreatment. At the time of admission, weights and heights were recorded,blood samples were collected, the buffy coat was isolated for DNApreparation and the plasma was separated for biochemical analysis. Asummary of their biochemical characteristics is listed in Table 5. Inthis first study, genotype analysis was performed for markers 9-27/261,99-14387/129, 9-12/124, 99-14405/105, and 9-16/189. TABLE 5Characteristics of Obese Adolescent Girls used in Study 1 ParameterValue n 159 Mean ± SEM Age (yrs) 12.1 ± 0.3  Body mass index (kg/m²)30.5 ± 0.5  Cholesterol (mg/dl) 172 ± 3.0  FFA (mM) 0.612 ± 0.022Glucose (mg/dl) 76.3 ± 0.81 Insulin (μU/ml) 16.4 ± 0.67 Leptin (ng/ml)35.7 ± 1.57

A second group of both obese girls and boys was also used to confirmsome results of the first study (Table 6). Genotype analysis wasperformed for markers 9-12/48, 9-12/124, 9-12/355, 99-14405/105, and9-16/189.

All parents of obese children provided informed consent for biologicaltesting and the use of DNA for genetic analysis. The study protocol wasapproved by the Comité Consultatifde Protection des PersonnesParticipants à la Recherche Clinique. TABLE 6 Characteristics of ObeseAdolescent Boys and Girls used in Study 2 Parameter Value n 155 Boys  55Girls 100 Mean ± SEM Age (yrs) 12.0 ± 0.3  Body mass index (kg/m²) 29.2± 0.5  Cholesterol (mg/dl)  171 ± 0.02 FFA (mM) 0.592 ± 0.021 Glucose(mg/dl) 74.5 ± 0.59 Insulin (μU/ml) 14.8 ± 0.72 Leptin (ng/ml) 31.2 ±1.62DNA Extraction

Blood samples were centrifuged 20 min at 913×g. The middle leukocytelayer was removed and washed twice in large volumes of 10 mM Tris HCl,pH 7.6 containing 5 mM MgCl₂ and 10 mM NaCl. To the cell pellet wasadded 3 ml of 10 mM Tris HCl, pH 7.6 containing 1 mM EDTA and 0.4 mMNaCl, 200 μl 10% (w/v) SDS, and 500 μl proteinase K (Sigma, St. Louis,Mo.; 1 mg/ml). Tubes were placed in a shaking water bath at 42° C. for 5h. Tubes were then chilled on ice for 10 min. To precipitate proteins, 1ml of 5 M NaCl was added and the precipitates were pelleted, and thesupernatant removed. To precipitate the DNA, isopropanol (5 ml) wasadded, followed by recentrifugation at 3210×g for 20 min. Thesupernatant was discarded and 5 ml of 70% ethanol was added to the DNApellet. After incubating 6 h or overnight at 4° C., the samples werespun at 2800×g for 5 min. The supernatant was poured off and discarded,and the pellet left to air dry. Once dry, 1.5 ml 10 mM Tris HClcontaining 10 mM EDTA was added and incubated at room temperature on arocker platform to rehydrate the DNA. DNA concentration was measured andthe DNA was stored at −20° C.

Single Nucleotide Polymorphism (SNP) Identification

Amplicons investigated covered the APM1 gene. Random markers weregenerated from amplicons derived from BAC sequence positioned in theindicated genomic regions (Table 7). The PCR primers were then used toamplify the corresponding genomic sequence in a pool of DNA from 100unrelated individuals (blood donors of French origin). PCR reactions (25ml) contained 2 ng/μl pooled DNA, 2 mM MgCl₂, 200 μM of each dNTP, 2.9ng/μl each primer, 0.05 unit/μl Ampli Taq Gold DNA polymerase (PerkinElmer, Foster City, Calif.) and 1× PCR buffer (10 mM Tris HCl pH 8.3, 50mM KCl). Amplification reactions were performed in a PTC200 MJ ResearchThermocycler, with initial denaturation at 95° C. for 30 sec, annealingat 54 ° C. for 1 min, and extension at 72° C. for 30 sec. After cycling,a final elongation step was performed at 72° C. for 10 min.Amplification products from pooled DNA samples were sequenced on bothstrands by fluorescent automated sequencing on ABI 377 sequences (PerkinElmer), using a dye-primer cycle analysis and DNA sequence extractionwith ABI Prism DNA sequencing Analysis software. Sequence data analysiswas automatically processed with AnaPolys (Genset, Paris, France), asoftware designed to detect the presence of SNPs among pooled amplifiedfragments. The polymorphism search is based on the presence ofsuperimposed peaks in the electrophoresis pattern from both strands,resulting from two bases occurring at the same position. The detectionlimit for the frequency of SNPs detected by microsequencing pools of 100individuals is about 10% for the minor allele, as verified by sequencingpool of known allelic frequencies. However, more than 90% of the SNPsdetected by the pooling method have a frequency for minor allele higherthan 20%. TABLE 7 Characteristics of Random SNPs* Allele Hardy-WeinbergRandom Chromosomal Allelic Frequency Equilibrium SNPs Localizationvariation (%) χ² A 7p12-p14 T → C 65 0.252 B 13q22 T → C 74 1.194 C14q24.1 A → G 54 0.027 D 14q31 T → C 62 0.322 E 14q31 G → C 64 0.092 F14q22-q23 T → A 79 0.594 G 16q22-q24 G → A 54 1.166 H 16q24 A → G 620.656 I 18p11-p31 A → G 51 0.319 J 21q22.8 A → G 56 0.054 K 21q22 C → T59 1.475 L 21q22.3 A → G 70 2.070 M 21q22.3 T → C 60 1.709 N 21q22.1 A →G 56 1.060*SNPs were identified using a pool of 100 DNA clones, as described inthe Experimental Procedures. The allele frequency and Hardy-Weinbergequilibrium were measured for each marker.Genotyping

Genotyping of individual DNA samples was performed using microsequencingprocedure as follows. Amplification products containing the SNPs wereobtained by performing PCR reactions similar as those described for SNPidentification. After purification of the amplification products, themicrosequencing reaction mixture was prepared by adding in a 20 μl finalvolume: 10 pmol microsequencing primer (which hybridize just upstream ofthe polymorphic base), 1 U of Thermosequenase (Amersham PharmaciaBiotech, Piscataway, N.J.) or TaqFS (Perkin Elmer) and the 2 appropriatefluorescent ddNTPs (Perkin Elmer, Dye Terminator Set) complementary tothe nucleotides at the polymorphic site of each SNP tested. After 4minutes at 94° C.; 20 microsequencing cycles of 15 sec at 55° C., 5 secat 72° C., and 10 sec at 94° C. were carried out in a GeneAmp PCR System9700 (PE Applied Biosystem). After reaction, the 3′-extended primerswere precipitated to remove the unincorporate fluorescent ddNTPs andanalysed by electrophoresis on ABI 377 sequencers. Following gelanalysis with GENESCAN software (Perkin Elmer), data were automaticallyprocessed with AnaMIS (Genset). Genotype data were compiled and checkedfor scoring accuracy with 32 duplicate samples.

Biochemical Analysis

Plasma biological parameters were determined using commerciallyavailable kits and following manufacturer instructions: (triglycerides,total cholesterol, and glucose: Roche Molecular Biochemicals; FFA: WakoChemical, Neuss, Germany; leptin and insulin: RIA from Linco, St.Charles, Mo.).

Statistical Analysis.

Allelic frequencies and χ2 test of Hardy Weinberg proportions wereperformed as data were collected (1-3). ANOVA was used for comparison ofdifference in time series. Two tailed t-test was used to compare thedifference at each time point and χ² analysis was used for comparison ofproportions.

Results

In this example, we refer to 9-27/261 as SNP1, 99-14387/129 as SNP#2,9-12/124 as SNP#3, 99-14405/105 as SNP#4, and 9-16/189 as SNP#5. Theapproximate location of the markers on a schematic (not to scale)drawing of the genomic structure of the Apm1 gene is provided in FIG. 1.The exact location of the markers in the genomic sequence of Apm1 isgiven in the sequence listing and in tables 1-4.

The effect of Apm1 polymorphisms on plasma lipid values in obeseadolescent girls was examined by separating the study population into 2groups based on their mean lipid value: one group with values above themean, and the second group with values below the mean. The genotypefrequencies of the two sample groups were then measured and analyzed forstatistical significance using the χ² test for each lipid parameter.Similar analyses were performed for 14 random markers generated, wherethe mean and 99.99% confidence interval are indicated as a solid anddotted line, respectively. This served as our negative control.

The results show that the genotype frequencies for SNP# 3-5 aresignificantly different for obese adolescent girls with low (i.e., belowthe mean) or high (i.e., above the mean) free fatty acids (FFA) levels(FIG. 2). The effect of Apm1 polymorphism on FFA in obese adolescentsgirls was also assessed in study 1 by comparing the mean values of FFAbetween the homozygote populations, with the heterozygotes included witheither of the homozygotes. The significance of the difference in FFAlevels is shown for SNP# 4 and 5 in FIG. 4. This indicates that high FFAlevels are associated with a specific genotype. The results presented inTable 8 show that the significant relationship between plasma FFA andgenotype of Apm1 SNP#4 that was observed in a population of obeseadolescent girls was not observed with any other parameters. The n valueis reduced since all patients in which FFA were not determined wereeliminated. TABLE 8 Effect of ACRP30 SNP #4 (99-14405/105) on Clinicaland Biochemical Parameters in Obese Adolescent Girls in Study 1 Totalp-value Paameter Population GG AG + AA (GG vs AG + AA) n 106 37 69 —Mean ± SEM Age (yrs) 11.3 ± 0.3 11.2 ± 0.4 11.4 ± 0.4 NS Body mass index(kg/m²) 29.5 ± 0.5 29.4 ± 0.7 29.6 ± 0.7 NS Leptin (ng/ml) 34.2 ± 1.533.9 ± 2.6 34.4 ± 1.8 NS Insulin (μU/ml) 16.9 ± 0.8 17.0 ± 1.1 16.8 ±1.1 NS Glucose (mg/dl) 74.6 ± 0.6 74.7 ± 1.1 74.6 ± 0.8 NS Triglycerides(mg/dl) 106.7 ± 5.4  96.8 ± 6.4 112.0 ± 7.5  NS Cholesterol (mg/dl)172.0 ± 3.6  165.3 ± 4.3  176.5 ± 5.0  NS FFA (mM)  0.612 ± 0.022  0.525± 0.031  0.659 ± 0.029 0.0037

The comparison of genotype and biochemical characteristics of the study2 population (mixture of boys and girls) is shown in Table 9. As inTable 8, the FFA are significantly different in the 2 sample groups(AG+AA versus GG). As for Table 8, only those patients with data on FFAwere kept in this analysis.

The effect of APM1 polymorphism on respiratory quotient in obeseadolescents was determined in study 2, where the respiratory quotientwas measured and compared to the genotype profile of the SNP#4 and 5, asin FIG. 4. A low respiratory quotient is associated with the samegenotype that indicates a high FFA level, and vice versa (FIG. 5). Thisis true for both markers #4 and 5. These results are also shown in Table9. TABLE 9 Effect of ACRP30 SNP 99-14405/105 on Clinical and BiochemicalParameters in Obese Adolescent Boys and Girls Total p-value ParameterPopulation GG AG + AA (GG vs AG + AA) n 97 37 60 — Boys 32 13 19 — Girls65 24 41 Mean ± SEM Age (yrs) 11.4 ± 0.3 11.1 ± 0.5 11.5 ± 0.4 NS Bodymass index (kg/m²) 29.7 ± 0.6 29.4 ± 0.8 29.9 ± 0.8 NS Leptin (ng/ml)31.3 ± 1.5 29.1 ± 2.6 32.6 ± 1.9 NS Insulin (μU/ml) 16.2 ± 0.8 15.5 ±0.9 16.6 ± 1.4 NS Glucose (mg/dl) 74.9 ± 0.6 75.1 ± 1.0 74.8 ± 0.8 NSTriglycerides (mg/dl) 110.5 ± 0.1  106.7 ± 0.1  112.7 ± 0.1  NSCholesterol (mg/dl) 167.7 ± 0.03 163.5 ± 0.04 170.5 ± 0.03 NS FFA (mM) 0.599 ± 0.021  0.545 ± 0.033  0.633 ± 0.027 0.046 Respiratory quotient 0.834 ± 0.005  0.848 ± 0.009  0.826 ± 0.005 0.026

The effect of Apm1 polymorphisms on the leptin/BMI relationship in obeseadolescent girls was also tested using a similar analysis as for thelipid values reported in FIG. 2, but using leptin/BMI ratio as theparameter. FIG. 3A shows the significant correlation between leptinlevels and BMI; this has previously been reported. FIG. 3B shows asignificant difference in genotype frequencies for SNP # 1-2.

The effect of Apm1 on leptin/BMI ratio in obese adolescents girls wasfurther analyzed using a similar analysis as that described for FIG. 5using leptin/BMI ratios calculated from those values measured instudy 1. The results indicate a significant difference in leptin/BMIratio between the 2 homozygote populations (FIG. 6). There was a lesssignificant difference (p =0.07), if the heterozygote population wasadded to the AA population. This increased the n value significantlyversus the CC population, and hence, reduced the power of the test. Wewould expect that with a bigger population size, this may becomesignificant.

The effect of Apm1 polymorphism on glucose tolerance in obese adolescentgirls was also determined. The difference in glucose tolerance,calculated as shown on the y-axis, was highly significant between thetwo homozygote populations of SNP#2 in obese adolescent girls (FIG. 7).

Apm1 function was predicted from polymorphism and in vivo analysis (FIG.8). Based on the analysis of the polymorphisms, we would expect thatthis protein is directly implicated in the regulation of FFA metabolism.In vivo studies indicate that an active form of APM1 does decrease FFAin the circulation. The parallel decrease of plasma triglyceridessuggests that the FFA are not being converted to triglycerides, butrather oxidized. The correlation of FFA concentrations with insulinresistance would suggest that insulin resistance would be decreased withlower circulating FFA. This, in turn, would create an environment moreresponsive to insulin, and hence, improve glucose tolerance.

Overall, these results demonstrate the utility of APM1 markers in assaysfor detecting a patient's ability or inability to oxidize FFA,particularly those derived from dietary lipid. This inability to oxidizeFFA would contribute to increased accumulation of FFA in storage by theadipose tissue, which would lead to the eventual development of obesity.

The presence of FFA is also directly related to insulin resistance.Therefore this would also reflect a patient's ability to manage highlevels of glucose, and his/her susceptibility towards the development oftype II diabetes.

While preferred embodiments of the invention have been illustrated anddescribed, it will be appreciated that various changes can be madetherein by one skilled in the art without departing from the spirit andscope of the invention.

REFERENCES

Abbondanzo S. J. et al. (1993) Methods in Enzymology, Academic Press,New York. pp. 803-823.

Ajioka R. S. et al. (1997) Am. J. Hum. Genet. 60:1439-1447.

Anton M. (1995) et al., J. Virol. 69: 4600-4606.

Araki K et al. (1995) Proc. Natl. Acad. Sci. USA. 92(1):160-4.

Ausubel et al. (1989) Current Protocols in Molecular Biology, GreenPublishing Associates and Wiley Interscience, N.Y.

Bates G. P. et al. (1997a) Hum. Mol. Genet. 6(10):633-1637.

Bates G P et al. (1997b) Molecular Medicine today, 508:515.

Baubonis W. (1993) Nucleic Acids Res. 21(9):2025-9.

Beaucage et al. (1981) Tetrahedron Lett. 22:1859-1862.

Bradley A., (1987) Production and analysis of chimeric mice. In: E. J.Robertson (Ed.), Teratocarcinomas and embryonic stem cells: A practicalapproach. IRL Press, Oxford, pp. 113.

Brown E. L., Belagaje R., Ryan M. J., Khorana H. G. (1979) MethodsEnzymol. 68:109-151.

Burright et al. (1997) Brain Pathology. 7:965-977.

Chai H. et al. (1993) Biotechnol. Appl. Biochem.18:259-273.

Chee et al. (1996) Science. 274:610-614.

Chen et al. (1987) Mol. Cell. Biol. 7:2745-2752.

Chen and Kwok (1997) Nucleic Acids Research.25:347-353.

Chen et al. (1997) Proc. Natl. Acad. Sci. USA. 94(20):10756-10761.

Chou J. Y. (1989) Mol. Endocrinol. 3:1511-1514.

Clark A. G. (1990) Mol. Biol. Evol. 7:111-122.

Coles R., Caswell R., and Rubinsztein D. C. (1998) Hum. Mol. Genet.7(5):791-800.

Compton J. (1991) Nature. 350(6313):91-92.

Davies S. W., Turmaine M., Cozens B. A., DiFiglia M., Sharp A. H., RossC. A., Scherzinger E., Feldman and Steg. (1996) Medecine/Sciences.12:47-55.

Dempster et al., (1977) J. R. Stat. Soc., 39B:1-38.

Dent D. S. and Latchman D. S. (1993) The DNA mobility shift assay. In:Transcription Factors: A Practical Approach (Latchman D S, ed.) Oxford:IRL Press. pp 1-26.

Eckner R. et al. (1991) EMBO J. 10:3513-3522.

Excoffier L. and Slatkin M. (1995) Mol. Biol. Evol., 12(5): 921-927.

Flotte et al. (1992) Am. J. Respir. Cell Mol. Biol. 7:349-356.

Fodor et al. (1991) Science 251:767-777.

Fraley et al. (1979) Proc. Natl. Acad. Sci. USA. 76:3348-3352.

Fried M. and Crothers D. M. (1981) Nucleic Acids Res. 9:6505-6525.

Fuller S. A. et al. (1996) Immunology in Current Protocols in MolecularBiology, Ausubel et al. Eds, John Wiley & Sons, Inc., USA.

Furth P. A. et al. (1994) Proc. Natl. Acad. Sci USA. 91:9302-9306.

Garner M. M. and Revzin A. (1981) Nucleic Acids Res.9:3047-3060.

Ghosh and Bacchawat (1991) Targeting of liposomes to hepatocytes, INLiver Diseases, Targeted diagnosis and therapy using specific receptorsand ligands. Wu et al.Eds., Marcel Dekeker, New York, pp. 87-104.

Gopal (1985) Mol. Cell. Biol., 5:1188-1190.

Gossen M. et al. (1992) Proc. Natl. Acad. Sci. USA. 89:5547-5551.

Gossen M. et al. (1995) Science. 268:1766-1769.

Graham et al. (1973) Virology 52:456-457.

Green et al. (1986) Ann. Rev. Biochem. 55:569-597.

Griffin et al. (1989) Science. 245:967-971.

Grompe, M. et al. (1989) Proc. Natl. Acad. Sci. U.S.A. 86:5855-5892.

Grompe, M. (1993) Nature Genetics. 5:111-117.

Gu H. et al. (1993) Cell 73:1155-1164.

Gu H. et al. (1994) Science 265:103-106.

Guatelli J C et al. Proc. Natl. Acad. Sci. USA. 35:273-286.

Gura. (1997) Science 275:751.

Hacia J. G., et al. (1996) Nat. Genet. 14(4):441-447.

Haff L. A. and Smimov I. P. (1997) Genome Research, 7:378-388.

Hames B. D. and Higgins S. J. (1985) Nucleic Acid Hybridization: APractical Approach. Hames and Higgins Ed., IRL Press, Oxford.

Harju L. et al. (1993) Clin Chem., 39(11 Pt 1):2282-2287.

Harland et al. (1985) J. Cell. Biol. 101:1094-1095.

Hawley M. E. et al. (1994) Am. J. Phys. Anthropol. 18:104.

Hillier L. and Green P. (1991) Methods Appl. 1: 124-8.

Hoess et al. (1986) Nucleic Acids Res. 14:2287-2300.

Hu E., Liang P., and Spiegelman B. M. (1996) J. Biol. Chem.271:10697-10703.

Huang L. et al. (1996) Cancer Res 56(5):1137-1141.

Huygen et al. (1996) Nature Medicine. 2(8):893-898.

Izant J. G. and Weintraub H. (1984) Cell 36(4):1007-1015.

Julan et al. (1992) J. Gen. Virol. 73:3251-3255.

Kanegae Y. et al., Nucl. Acids Res. 23:3816-3821.

Khoury J. et al. (1993) Fundamentals of Genetic Epidemiology, OxfordUniversity Press, NY.

Kim U-J. et al. (1996) Genomics 34:213-218.

Klein et al. (1987) Nature. 327:70-73.

Koller et al. (1992) Annu. Rev. Immunol. 10:705-730.

Kopp M. U., Mello A. J., Manz A., (1998) Science. 280(5366):1046-1048.

Kozal M. J. et al. (1996) Nat. Med. 2(7):753-759.

Landegren U. et al. (1998) Genome Research, 8:769-776.

Lander and Schork (1994) Science. 265:2037-2048.

Lange K. (1997) Mathematical and Statistical Methods for GeneticAnalysis. Springer, New York.

Lenhard T. et al. (1996) Gene. 169:187-190.

Lin M. W. et al. (1997) Hum. Genet. 99(3): 417-420.

Linton M. F. et al. (1993) J. Clin. Invest. 92:3029-3037.

Liu Z. et al. (1994) Proc. Natl. Acad. Sci. USA. 91: 4528-4262.

Livak K. J. and Hainer J. W. (1994) Hum. Mutat. 3(4):379-385.

Lockhart et al. (1996) Nature Biotechnology 14:1675-1680.

Mackey K., Steinkamp A., and Chomczynski P. (1998) Mol Biotechnol.9(1):1-5.

Maeda et al. (1996) Biochem. Biophys. Res. Comm. 221:286-289.

Mangiarini L., Sathasivam K., Mahal A., Mott R., Seller M., and Bates G.P. (1997) Nat. Genet. 15(2):197-200.

Mansour S. L. et al. (1988) Nature. 336:348-352.

Manzetal. (1993) Adv. in Chromatogr. 33:1-66.

Marshall R. L. et al. (1994) PCR Methods and Applications. 4:80-84.

McCormick et al. (1994) Genet. Anal. Tech. Appl. 11:158-164.

McLaughlin B. A. et al. (1996) Am. J. Hum. Genet. 59:561-569.

Montague et al. (1997) Nature. 387:903.

Morton N. E. (1955) Am. J. Hum. Genet. 7:277-318.

Muzyczka et al. (1992) Curr. Topics in Micro. and Immunol. 158:97-129.

Nada S. et al. (1993) Cell 73:1125-1135.

Nagy A. et al. (1993) Proc. Natl. Acad. Sci. USA. 90: 8424-8428.

Narang S. A., Hsiung H. M. (1979) Brousseau R., Methods Enzymol.68:90-98.

Neda et al. (1991) J. Biol. Chem. 266:14143-14146.

Newton et al. (1989) Nucleic Acids Res. 17:2503-2516.

Nickerson D. A. et al. (1990) Proc. Natl. Acad. Sci. U.S.A.87:8923-8927.

Nicolau et al. (1982) Biochim. Biophys. Acta. 721:185-190.

Nyren P. et al. (1993) Anal. Biochem. 208(1):171-175.

O'Reilly et al. (1992) Baculovirus Expression Vectors: A LaboratoryManual. W. H. Freeman and Co., New York.

Ohno et al. (1994) Science. 265:781-784.

Orita et al. (1989) Proc. Natl. Acad. Sci. U.S.A. 86: 2776-2770.

Ott J. (1991) Analysis of Human Genetic Linkage. John Hopkins UniversityPress, Baltimore.

Pastinen et al. (1997) Genome Research. 7:606-614.

Pease S. and William R. S. (1990) Exp. Cell. Res. 190:09-211.

Perlinetal. (1994) Am. J. Hum. Genet. 55:777-787.

Peterson et al. (1993) Proc. Natl. Acad. Sci. USA. 90: 7593-7597.

Pietu et al. (1996) Genome Research.6:492-503.

Potter et al. (1984) Proc. Natl. Acad. Sci. USA. 81(22):7161-7165.

Reid L. H. et al. (1990) Proc. Natl. Acad. Sci. USA. 87:4299-4303.

Risch, N. and Merikangas, K. (1996) Science. 273:1516-1517.

Robertson E. (1987) “Embryo-Derived Stem Cell Lines.” In: E. J.Robertson Ed. Teratocarcinomas And Embryonic Stem Cells: A PracticalApproach. IRL Press, Oxford, pp. 71.

Rossi et al. (1991) Pharmacol. Ther. 50:245-254.

Roth J. A. et al. (1996) Nature Medicine. 2(9):985-991.

Rougeot, C. et al. Eur. J. Biochem. 219(3):765-773.

Roux et al. (1989) Proc. Natl. Acad. Sci. U.S.A. 86:9079-9083.

Ruano et al. (1990) Proc. Natl. Acad. Sci. USA. 87:6296-6300.

Sambrook, J., Fritsch, E. F., and T. Maniatis. (1989) Molecular Cloning:A Laboratory Manual. 2ed. Cold Spring Harbor Laboratory, Cold SpringHarbor, N.Y.

Samson M, et al. (1996) Nature, 382(6593):722-725.

Samulski et al. (1989) J. Virol. 63:3822-3828.

Sanchez-Pescador R. (1988) J. Clin. Microbiol. 26(10):1934-1938.

Sandou et al. (1994) Science. 265:1875-1878.

Sarkar, G. and Sommer S. S. (1991) Biotechniques.

Sauer B. et al. (1988) Proc. Natl. Acad. Sci. U.S.A. 85:5166-5170.

Schaid D. J. et al. (1996) Genet. Epidemiol. 13:423-450.

Schedl A. et al. (1993a) Nature. 362:258-261.

Schedl et al. (1993b) Nucleic Acids Res. 21:4783-4787.

Schena et al. (1995) Science. 270:467-470.

Schena et al. (1996) Proc. Natl. Acad. Sci. U.S.A. 93(20):10614-10619.

Schneider et al. (1997) Arlequin: A Software For Population GeneticsData Analysis. University of Geneva.

Sczakiel G. et al. (1995) Trends Microbiol. 3(6):213-217.

Shay J. W. et al. (1 991) Biochem. Biophys. Acta. 1072:1-7.

Sheffield, V. C. et al. (1991) Proc. Natl. Acad. Sci. U.S.A. 49:699-706.

Shizuya et al. (1992) Proc. Natl. Acad. Sci. U.S.A. 89:8794-8797.

Shoemaker D. D. et al. (1996) Nat. Genet. 14:450-456.

Smith (1957) Ann. Hum. Genet. 21:254-276.

Smith et al. (1983) Mol. Cell. Biol. 3:2156-2165.

Sosnowski R. G. et al. (1997) Proc. Natl. Acad. Sci. U.S.A.94:1119-1123.

Spielmann S. et al. (1993) Am. J. Hum. Genet. 52:506-516.

Spielmann S. and Ewens W. J. (1998) Am. J. Hum. Genet. 62:450-458.

Sternberg N. L. (1992) Trends Genet. 8:1-16.

Sternberg N. L. (1994) Mamm. Genome. 5:397-404.

Syvanen A. C. et al. (1994) Clin. Chim. Acta. 226(2):225-236.

Tacson et al. (1996) Nature Medicine. 2(8):888-892.

Te Riele et al. (1990) Nature. 348:649-651.

Terwilliger J. D. and Ott J. (1994) Handbook of Human Genetic Linkage.John Hopkins University Press, London.

Thomas K. R. et al. (1986) Cell. 44:419-428.

Thomas K. R. et al. (1987) Cell. 51:503-512.

Tur-Kaspa et al. (1986) Mol. Cell. Biol. 6:716-718.

Tyagi et al. (1998) Nature Biotechnology. 16:49-53.

Urdea M. S. (1988) Nucleic Acids Research. 11:4937-4957.

Urdea M.S. et al. (1991) Nucleic Acids Symp. Ser. 24:197-200.

Van der Lugt et al. (1991) Gene. 105:263-267.

Vlasak R. et al. (1983) Eur. J. Biochem. 135:123-126.

Wabiko et al. (1986) DNA. 5(4):305-314.

Walker et al. (1996) Clin. Chem. 42:9-13.

Wanker E. E., Mangiarini L., and Bates G. P. (1997) Cell. 90(3):537-48.

Weir, B. S. (1996) Genetic data Analysis II. Methods for Discretepopulation genetic Data, Sinauer Assoc., Inc., Sunderland, Mass., U.S.A.

White, M. B. et al. (1992) Genomics. 12:301-306.

White, M. B. et al. (1997) Genomics. 12:301-306.

Wong et al. (1980) Gene. 10:87-94.

Wood S. A. et al. (1993) Proc. Natl. Acad. Sci. U.S.A. 90:4582-4585.

Wu and Wu (1987) J. Biol. Chem. 262:4429-4432.

Wu and Wu (1988) Biochemistry. 27:887-892.

Wu et al. (1989) Proc. Natl. Acad. Sci U.S.A. 86:2757.

Yagi T. et al. (1990) Proc. Natl. Acad. Sci. U.S.A. 87:9918-9922.

Zhao et al. (1998) Am. J. Hum. Genet. 63:225-240.

Zou Y. R. et al. (1994) Curr. Biol. 4:1099-1103.

Hill, W. G. (1974) in Heredity, (Edinburgh) pp. 229-239.

Terwilliger, J. O. (1994) Handbook for Human Genetic Linkage (JohnHopkins University Press, Baltimore).

Schneider, S. Dueffer, J. M., Roessli, & Excofier, L. (1997) Arlequin: Asoftware for popluation genetic data analysis, 1.1 edition (Genetics andBiometry Laboratory, Department of Anthropology, University of Geneva,Geneva).

1. A composition of matter comprising: a) an isolated, purified, orrecombinant polynucleotide: i) comprising SEQ ID NO: 1 or the complementthereof; ii) comprising SEQ ID NO: 5 or the complement thereof; iii)consisting essentially of a contiguous span of 8 to 50 nucleotides ofSEQ ID NO: 1 or the complement thereof, wherein said span includes anAPM1-related biallelic marker in said sequence and said APM1-relatedbiallelic marker is selected from the group consisting of A1 (SEQ ID NO:1, position 3787), A2 (SEQ ID NO: 1, position 11118), A4 (SEQ ID NO: 1,position 15196), A7 (SEQ ID NO: 1, position 15863) and A8 (SEQ ID NO: 1,position 17170); iv) consisting essentially of a contiguous span of 18to 35 nucleotides of SEQ ID NO: 1 or the complement thereof, whereinsaid span includes an APM1-related biallelic marker in said sequence,said biallelic marker is within 4 nucleotides of the center of saidpolynucleotide and said APM1-related biallelic marker is selected fromthe group consisting of A1 (SEQ ID NO: 1, position 3787), A2 (SEQ ID NO:1, position 11118), A4 (SEQ ID NO: 1, position 15196), A7 (SEQ ID NO: 1,position 15863) and A8 (SEQ ID NO: 1, position 17170); v) consisting ofa contiguous span of 25 nucleotides of SEQ ID NO: 1 or the complementthereof, wherein said span includes an APM1-related biallelic marker insaid sequence, said biallelic marker is at the center of saidpolynucleotide and said APM1-related biallelic marker is selected fromthe group consisting of A1 (SEQ ID NO: 1, position 3787), A2 (SEQ ID NO:1, position 11118), A4 (SEQ ID NO: 1, position 15196), A7 (SEQ ID NO: 1,position 15863) and A8 (SEQ ID NO: 1, position 17170); vi) consistingessentially of a contiguous span of 8 to 50 nucleotides of SEQ ID NO: 1or the complement thereof, wherein the 3′ end of said contiguous span islocated at the 3′ end of said polynucleotide, said biallelic marker ispresent at the 3′ end of said polynucleotide and said span includes anAPM1-related biallelic marker in said sequence selected from the groupconsisting of A1 (SEQ ID NO: 1, position 3787), A2 (SEQ ID NO: 1,position 11118), A4 (SEQ ID NO: 1, position 15196), A7 (SEQ ID NO: 1,position 15863) and A8 (SEQ ID NO: 1, position 17170); vii) consistingessentially of a contiguous span of 8 to 50 nucleotides of SEQ ID NO: 1or the complement thereof, wherein the 3′ end of said contiguous span islocated at the 3′ end of said polynucleotide, said biallelic marker insaid sequence is located within 20 nucleotides upstream of the 3′ end ofsaid polynucleotide and said span includes an APM1-related biallelicmarker in said sequence selected from the group consisting of A1 (SEQ IDNO: 1, position 3787), A2 (SEQ ID NO: 1, position 11118), A4 (SEQ ID NO:1, position 15196), A7 (SEQ ID NO: 1, position 15863) and A8 (SEQ ID NO:1, position 17170); viii) consisting essentially of a contiguous span of8 to 50 nucleotides of SEQ ID NO: 1 or the complement thereof, whereinthe 3′ end of said contiguous span is located at the 3′ end of saidpolynucleotide, said biallelic marker in said sequence is located 1nucleotide upstream of the 3′ end of said polynucleotide and said spanincludes an APM1-related biallelic marker in said sequence selected fromthe group consisting of Al (SEQ ID NO: 1, position 3787), A2 (SEQ ID NO:1, position 11118), A4 (SEQ ID NO: 1, position 15196), A7 (SEQ ID NO: 1,position 15863) and A8 (SEQ ID NO: 1, position 17170); b) a recombinantvector comprising a polynucleotide comprising: 1) SEQ ID NO: 1 or thecomplement thereof; or 2) SEQ ID NO: 5 or the complement thereof; or c)a host cell, or non-human host animal or mammal, comprising arecombinant vector comprising a polynucleotide comprising: 1) SEQ ID NO:1 or the complement thereof; or 2) SEQ ID NO: 5 or the complementthereof.
 2. A method of genotyping comprising determining the identityof a nucleotide at an APM1-related biallelic marker of SEQ ID NO: 1 orthe complement thereof in a biological sample, wherein said APM1-relatedbiallelic marker is selected from the group consisting of A1 (SEQ ID NO:1, position 3787), A2 (SEQ ID NO: 1, position 11118), A4 (SEQ ID NO: 1,position 15196), A7 (SEQ ID NO: 1, position 15863) and A8 (SEQ ID NO: 1,position 17170).
 3. The method according to claim 2, wherein saidbiological sample is derived from a single subject.
 4. The methodaccording to claim 3, wherein the identity of the nucleotides at saidbiallelic marker is determined for both copies of said biallelic markerpresent in said individual's genome.
 5. The method according to claim 2,wherein said biological sample is derived from multiple subjects.
 6. Themethod according to claim 2, further comprising amplifying a portion ofsaid sequence comprising the biallelic marker prior to said determiningstep.
 7. The method according to claim 2, wherein said determining isperformed by: (i) a hybridization assay; (ii) a sequencing assay; (iii)a microsequencing assay; or (iv) an allele specific amplification assay.8. A method of detecting an association between a genotype and aphenotype, comprising the steps of: a) genotyping at least oneAPM1-related biallelic marker in a trait positive population accordingto the method of claim 2; b) genotyping said APM1-related biallelicmarker in a control population according to the method of claim 2; andc) determining whether a statistically significant association existsbetween said genotype and said phenotype.
 9. A method of estimating thefrequency of a haplotype for a set of biallelic markers in a population,comprising: a) genotyping at least one APM1-related biallelic markeraccording to claim 3 for each individual in said population; b)genotyping a second biallelic marker by determining the identity of thenucleotides at said second biallelic marker for both copies of saidsecond biallelic marker present in the genome of each individual in saidpopulation; and c) applying a haplotype determination method to theidentities of the nucleotides determined in steps a) and b) to obtain anestimate of said frequency.
 10. A method of detecting an associationbetween a haplotype and a phenotype, comprising the steps of: a)estimating the frequency of at least one haplotype in a trait positivepopulation according to the method of claim 9; b) estimating thefrequency of said haplotype in a control population according to themethod of claim 9; and c) determining whether a statisticallysignificant association exists between said haplotype and saidphenotype.
 11. The method according to claim 8, wherein said phenotypeis a disease involving obesity or disorders related to obesity.
 12. Themethod according to claim 10, wherein said phenotype is a diseaseinvolving obesity of disorders related to obesity.